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We study dynamical reversibility in stationary stochastic processes from an information theoretic 
perspective. Extending earlier work on the reversibility of Markov chains, we focus on finitary pro- 
cesses with arbitrarily long conditional correlations. In particular, we examine stationary processes 
represented or generated by edge-emitting, finite-state hidden Markov models. Surprisingly, we 
find pervasive temporal asymmetries in the statistics of such stationary processes with the conse- 
quence that the computational resources necessary to generate a process in the forward and reverse 
temporal directions are generally not the same. In fact, an exhaustive survey indicates that most 
stationary processes are irreversible. We study the ensuing relations between model topology in 
different representations, the process's statistical properties, and its reversibility in detail. A pro- 
cess's temporal asymmetry is efficiently captured using two canonical unifilar representations of the 
generating model, the forward-time and reverse-time e-machines. We analyze example irreversible 
processes whose e-machine presentations change size under time reversal, including one which has 
a finite number of recurrent causal states in one direction, but an infinite number in the opposite. 
From the forward-time and reverse-time e-machines, we are able to construct a symmetrized, but 
nonunifilar, generator of a process — the bidirectional machine. Using the bidirectional machine, we 
show how to directly calculate a process's fundamental information properties, many of which are 
otherwise only poorly approximated via process samples. The tools we introduce and the insights 
we offer provide a better understanding of the many facets of reversibility and irreversibility in 
stochastic processes. 
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overwhelming number of microscopic configura- 
tions that realize the eventual state. 

Here, we analyze a generalized notion of irre- 
versibility: Behavior in reverse time gives rise to 
a different stochastic process than that in forward 
time. This dynamical irreversibility subsumes re- 
laxation, but is not so constrained, since it can 
occur in a nonequilibrium steady state. A dy- 
namical parallel to the shattered glass example 
of transient relaxation is found in a "continuous- 
flow" glass grinder: Continuously fed whole glass, 
the grinder eventually produces glass pieces that 
are sufficiently small to pass out via a sieve. Af- 
ter a transient start-up time, the distribution of 
glass sizes settles down to a steady state. The 
glass grinding process is dynamically irreversible. 

We explore irreversibility in stationary stochas- 
tic systems using new tools from information 
theory and computational mechanics. We show 
that a system's causal structure and informa- 
tion storage depend on time's arrow, while its 
rate of generating information does not. We de- 



One of the principal early mysteries of thermo- 
dynamics was the origin of irreversibility: While 
microscopic equations of motion describe be- 
haviors that are the same in both time direc- 
tions, why do large-scale systems exhibit tem- 
poral asymmetries? Many thermodynamic pro- 
cesses go in one direction: Closed systems devolve 
from order to disorder, heat flows from high tem- 
perature to low temperature, and shattered glass 
does not reassemble itself spontaneously. These 
are described as transient relaxation processes in 
which a system moves from one macroscopic state 
to another with high probability since there is an 
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velop a time-symmetric representation — the bi- 
directional machine — that allows one to directly 
determine key informational and computational 
properties, including how much stored informa- 
tion is hidden from observation, the number of 
excess statistical degrees of freedom, the amount 
of internal information that anticipates future be- 
havior, and the like. We summarize the analy- 
sis via a new irreversibility classification scheme 
for stochastic processes. Overall, the result is an 
enriched view of irreversibility and its compan- 
ion properties — a view that enhances our under- 
standing of the relationship between energy and 
information and of the structure of the physical 
substrates that carry them. 



I. INTRODUCTION 

Dynamical systems, by definition, evolve in time. 
Practically all of what we may know about a system is 
derived from careful observation of its change in time. 
In their attempt to understand underlying mechanisms, 
physicists cast observations in the language of mathemat- 
ics, spelling out "equations of motion" to model how a 
system's temporal behavior arises from the forces acting 
on it. In some settings, such modeling allows for fore- 
casting a system's behavior given its current state, but 
also allows for tracing its evolution backward in time. 

The equations of motion of classical mechanics meet 
this ideal; they are dynamically reversible. From cur- 
rent observations of the sky, we are able to precisely de- 
termine planet motions hundreds of years into the past 
and future; we can determine the future course of mete- 
orites, but also where they came from. This dynamical 
reversibility is tied to the fact that the mechanical equa- 
tions of motion provide an invertible one-to-one mapping 
of a system's current state to its future state; that is, 
they specify a deterministic dynamic. Given a mechan- 
ical system's current state at a single instant, Laplace's 
Daemon, in principle, can predict the system's entire fu- 
ture and entire past [Ij . 

In practice, the limited precision to which we can spec- 
ify initial conditions and the often high sensitivity of the 
equations of motion with respect to changes of the initial 
conditions restrict our ability to predict a system's future 
or to reconstruct its history over a long period of time. 

Acknowledging this fundamental limitation, statisti- 
cal mechanics introduced the distinction between a sys- 
tem's macroscopic state and its microscopic state. For 
example, the precise momenta and positions of the par- 
ticles in a gas container form the system's microstate, 
whose behavior is governed by deterministic, reversible 



dynamics. Only averages over these microstates are ac- 
cessible to us, though, being measured as pressure, tem- 
perature, volume, and the like. These thermo dynamic- 
state variables in turn describe the system's macrostate 
and their interdependence is given by the thermody- 
namic equations of state. Interestingly, once a thermo- 
dynamic system reaches equilibrium — and only then are 
the thermodynamic-state variables defined and the state 
equations valid — we have no way to discover the system's 
past. That is, we cannot know how the system reached 
this thermodynamic state since it is not possible to trace 
back the system's evolution from its current state: Its 
macroscopic dynamics are irreversible [2]. Equilibrium 
thermodynamics then leaves us with a description of a 
system in terms of thermodynamic macrostates that are 
entirely devoid of traces of the system's past and that are 
trivial with respect to the system's further evolution. 

Between the extremes of deterministic mechanical sys- 
tems with their complete reversibility and thermody- 
namic systems that do not admit reconstructing the 
system's past, we find stochastic processes. Stochastic 
processes exhibit nontrivial, nondeterministic dynamical 
evolution that combines the ability to reconstruct histor- 
ical evolution and to forecast future behavior in a prob- 
abilistic setting. 

Here, developing an infornration-theoretic perspective, 
we study the reversibility of stochastic processes; specif- 
ically, our ability to make assertions about a process's 
past from current and future observations. We contrast 
the act of reconstructing a process's past based on cur- 
rent and future observations (retrodiction) with that of 
forecasting a process's future based on past and current 
observations (prediction). We show the two tasks exhibit 
a number of unexpected and nontrivial asymmetries. In 
particular, we show that in contrast to deterministic dy- 
namical systems, where the forward and reverse evolu- 
tion can be computed at the same computational cost — 
solving a differential equation — predicting and retrodict- 
ing a stochastic process's evolution may come at very dif- 
ferent computational costs. More precisely, we show that 
the canonical generators of stochastic processes, their 
"equations of motion" so to speak, are generally far from 
invariant under time reversal. Via an exhaustive sur- 
vey, we demonstrate that irreversibility is an overwhelm- 
ingly dominant property of structurally complex stochas- 
tic processes. This asymmetry shows that depicting pro- 
cesses only by either their forward or reverse generators 
typically does not provide a complete description. This 
leads us to introduce a time-symmetric representation of 
a stochastic process that allows a direction calculation 
of key informational and computational quantities asso- 
ciated with the process's evolution in forward and back- 
ward time directions. With these tools at hand, we are 



then able to establish a novel classification of stochastic 
processes in terms of their reversibility, providing new 
insights into the diversity of information processing em- 
bedded in physical systems. 

Of fundamental importance for our discussion is the 
notion of the "state" of a probabilistic process and the 
use of state-based models — the so-called generators — to 
describe stochastic processes. We introduce these in 
Section In] . Continuing in more familiar territory, Sec- 
tion |III| reviews reversibility in processes whose genera- 
tors have states which can be directly observed — the so- 
called Markov chains. Section [TV] expands the discussion 
to a broader class of models, the hidden Markov mod- 
els (HMMs), whose states cannot be directly observed. 
There, we utilize the information measures from Ref. [5] 
to describe ways in which a process hides internal struc- 
ture from observations. Then we draw out the differences 
between models of processes with and without observable 
states. In this, we confront the issue of process structure. 
This leads Sec. |V] to introduce a canonical representa- 
tion for each process — the e-machine. At this point, irre- 
versibility of HMMs becomes necessarily tied to proper- 
ties of the e-machine. There, we introduce the e-machine 
information diagram which is a useful roadmap for the 
various information measures and corresponding process 
properties. A number of example processes are analyzed 
to help ground the concepts introduced up to this point. 
A new presentation is required to go further, however, 
and Sec. |VII| introduces and analyzes a process's bidi- 
rectional machine using Ref. [3]'s information measures. 
Finally, we conclude by drawing out the thermodynamic 
implications for these notions of irreversibility and com- 
menting on its role in applications. 



II. PROCESSES AND GENERATORS 

To keep our analysis of irreversibility constructive, 
our focus here is on discrete-time, discrete-valued sta- 
tionary processes and their various alternate represen- 
tations. This class includes the symbolic dynamics of 
chaotic dynamical systems, one-dimensional spin chains, 
and cellular automata spatial configurations, to men- 
tion three well-known, complex applications. Histori- 
cally, one-dimensional stochastic processes were studied 
using generators — models that reproduce the process's 
statistics in a time-ordered sequence. The tradition of 
using generators is so strong that their time-order is of- 
ten treated as synonymous with the process's time-order 
which, as the following will remind the reader, need not 
exist. Much of the following requires that we loosen the 
seemingly natural assumption of time-order. 

To begin, we define processes strictly in terms of prob- 



ability spaces [1] . Consider the space A^ of bi-infinite se- 
quences consisting of symbols from A., a finite set known 
as the alphabet. Taking X to be the a-field generated by 
the cylinder sets oi A , we assign probabilities to sets in 
X via a measure fi. The 3-tuple {A^, X, /i) is a probability 
space that we refer to as a process, denoting it V. 

Let Xi denote the random variable that describes 
the outcomes at index i. As a convenient short- 
hand |5j, we denote random variable blocks as Xi-j = 
XiXi^i ■ ■ ■ Aj_i, j > i. When j — i, the block has length 
zero and this is used to keep definitions simple. 

For example, consider a process with alphabet A — 
{a, 6, c} for which the word w ~ abc has the correspond- 
ing cylinder set {x € A^\Xq ^ a,Xi ^ b,X2 = c}. The 
probability of w is defined to be the probability of its 
cylinder set in X: 

P(Xo:3 ^w)^ V{Xo - a, Ai - 6, X2 = c) 

= ^l{{xe A^\Xo = a,Xi^ b,X2 = c}) . 

Notice that time does not appear explicitly in the def- 
inition of a process as a probability space. Indeed, the 
indexing of Xi can refer, for example, to locations on a 
spatial lattice. 

While one need not interpret a process in terms of time, 
temporal interpretations are often convenient. The ran- 
dom variable block leading up to "time" t is referred to 
as the past and denoted X^t = . . .Xt^3Xt-2Xt~i. Ev- 
erything from t onward is referred to as the future and 
denoted Xf. = XtXt+iXt+2 ■ ■ ■■ We restrict ourselves to 
stationary processes by demanding that P yield the same 
probabilities for blocks whose indices are shifts of one an- 
other: P{Xo:l) = f{Xt:t+L) for aU t and L. When con- 
sidering generative models, we work with a semi-infinite 
sequence of random variables, but due to stationarity, 
the distribution can be uniquely extended to a probabil- 
ity distribution over bi-infinite sequences 4J. 

Generators are dynamical systems and so time, as a 
concept, is fundamental. That being said, there are two 
natural and, generally, distinct ways of generating a pro- 
cess. When the time order of the generator coincides 
with the process's index, which increases (a priori) left- 
to-right, the model is a forward generator of the process. 
When its time order is the opposite of the process's index, 
the model is a reverse generator. 

Given a process, if we isolate a block of symbols, that 
block's probability is the same using the forward and 
reverse generators. The only difference is in how the 
block indices are interpreted [S]. On occasion, it will 
be helpful to consider a random variable whose index 
increases when scanning right-to- left, as this corresponds 
to increasing time from a reverse generator's frame of 
reference. Such random variables will be decorated with 



a tilde as in Xt . 

One might object to this detailed level of distinction 
on the grounds that different indexings mean that, in 
fact, we have two different processes — processes that are 
coupled to one another under time reversal. We acknowl- 
edge this point, but simplicity later on leads us to choose 
to refer to the process and, additionally, its forward and 
reverse generators. 

Finally, note that our terminology — past and future — 
smacks of privileging the process's forward generator. In- 
deed, the reverse generator has its own "past" which cor- 
responds to the forward generator's "future" . Generally 
though, we avoid basis-shifting discussions and continue 
to use the biased terminology in prose, definitions, and 
figures. The result is that one must consciously transform 
scanning process variables from one way to the other. 
Examples will exercise this and so help clarify the issue. 



III. GENERATORS WITH OBSERVABLE 
STATES 

We review basic results about Markov processes, their 
reversibility, and their models — Markov chains. In 
Markov chains, the states of the system are defined to 
be the system observables and, so, Markov chains are 
models of Markov processes whose states are observable. 
For a more detailed treatment see Ref. [7]. 



A. Definitions 

A finite Markov process is a sequence of random vari- 
ables XqXi, . . . each taking values from a finite set A. 
However, the sequence is constrained such that the prob- 
ability of any symbol depends only on the most recently 
seen symbol. Thus, for x,j/ G .4, w e A^~^, and L eN, 
we have: 

V{Xl = ij\Xo..L - wx) = nXL = v\Xl-i = x). 

Assuming stationarity, a finite Markov process is 
uniquely specified by a right-stochastic matrix: 




T{x,y)=¥{X,^y\Xo^x). 



(a) 



(b) 



FIG. 1. (a) An irreducible Markov chain M of the Golden 
Mean Process, which consists of all binary sequences with no 
consecutive Os. (b) Its time-reversed chain M which, in this 
case, is the same as the original chain. 



Every irreducible finite Markov chain has a unique sta- 
tionary distribution -k over A obeying: 

7r(y) = 2, ^{x)T{x,y) for all y E A. 

In matrix notation, we simply write tt — ttT. The 
Golden Mean Markov chain has stationary distribution 
TT = (2/3, 1/3), where P(Xo = 1) = 7r(l) = 2/3. 

We calculate the probability of any word xq.l by fac- 
toring the joint probability P(Xo:l = a^Oii) into a prod- 
uct of conditional probabilities. An application of the 
Markov property reduces the calculation to: 

P(Ao = a;o,Xi =xi,...,Al_i = xl-i) 

= ■^{xo)T{xo, Xi)T{xi,X2) ■ ■ ■ T{xl_2,xl-i) 

L-l 

= '^{xo) Y\. T{xt-i,xt). 

t=i 

More generally, one considers order-R Markov pro- 
cesses for which the next symbol depends on the pre- 
vious R symbols. (See App. IaI) Although these can be 
shown to be equivalent to a standard Markov chain over 
a larger state space, we avoid this approach and consider 
the Markov order as a property of the process. When the 
next symbol depends on the entire past, though, then R 
is infinite and the Markov chain, in effect, has an infinite 
number of states. In Sec. II VI we show how hidden Markov 
models can be used to represent many such chains, while 
utilizing only a finite state space. 



This matrix defines the model class of Markov chains. 
As an example, consider the Golden Mean Process [S], 
whose Markov chain is shown in Fig. fl|a). It has state 
space A = {0, 1} and its state transitions are labeled by 
T(x, y). This Markov chain is irreducible since from each 
state one can reach any other state by following succes- 
sive transitions. We work only with irreducible Markov 
chains in the following. 



B. Reversibility 

A intuitive definition of a reversible Markov process is 
that it should be indistinguishable (in probability) from 
the same process run backwards in time. Thus, we define 
a Markov process as reversible if and only if for all w € 



A^ and all L e N. we have: 



P(Xo:L = W) = V{Xo:l 



w) 



(1) 



where w = wq . . . w^^i and w = w^-i . . .Wq is its rever- 
sal. 

Given a Markov process, if the transition matrix of its 
unique chain obeys: 



Tr{x)T{x,y) = n{y)T{y,x), 



(2) 



for all a;,y e A^ then we say the Markov chain is in 
detailed balance. Note that the uniqueness of the chain 
allows us to associate detailed balance with the Markov 
process as well. The Markov chain representation of the 
Golden Mean Process in Fig. Ilia) is in detailed balance. 
It turns out that a stationary, finite Markov process is 
reversible if and only if its Markov chain, as specified by 
T, is detailed balance 0. To see this in one direction, 
assume detailed balance, then: 

L-l 
P(Xo:L ^w)^ ^(wo) W T{wt-l,Wt) 
t=l 
1 

= Tr{wL~i) JJ T{wt,wt-i) 

t=L~l 
= P(Xo:L = W) . 

Conversely, if the Markov process is reversible, then by 
considering only words of length two we have P(Xo = 
x^Xi = y) = V{Xo = y,Xi = x). This is exactly the 
statement of detailed balance. 

Given a Markov chain, we can use the condition for 
detailed balance to define another chain that generates 
words with the same probabilities as the original chain, 
but in reverse order. If T is the state transition matrix 
of an irreducible Markov chain and tt is its unique sta- 
tionary distribution, then its time-reversed Markov chain 
has state transition matrix given by: 



T{x,y) 



V{Xo = y\Xi 

'^{y)T{y,x) 

t:(x) 



x) 



(3) 



It is easy to see that if tt is stationary for T, then it is also 
stationary for T. Figure |l[b) shows the time-reversed 
chain for the Golden Mean Process. It is the same as the 
forward-time chain and, thus, is also in detailed balance. 
Considering the time-reversed Markov chain as a gen- 
erator, we interpret: 



as the generator's probability of seeing x followed by y 
followed by z. In its local time perspective, we can repre- 
sent this as Xo:3 — xyz. By construction, our expectation 
is that this probability should be equal to the probability 
(as calculated by the forward generator) of seeing x pre- 
ceded by y preceded by z. That is, Xq-,^ — zyx. And so, 
we can justify the designation of being the time-reversed 
Markov chain by demonstrating that it does, indeed, gen- 
erate words in reverse time: 

L-l 
P{Xo:L =w)^ tt{wo) Y[ f{wt-l,Wt) 



t=L-l 
= P(Xo:L = W) . 

This result provides an alternative characterization of re- 
versibility in Markov processes: A Markov process is re- 
versible if and only if: 



V{Xo:L^w)^V{Xo:L^w). 



(4) 



Note that while Eq. (fTl) is a self-comparison test, Eq. Q 
is a comparison between two distinct Markov chains. 
Also, observe that if a Markov chain is reversible, then 
T = T, due to detailed balance. Thus, a reversible 
Markov chain is identical to its time-reversed Markov 
chain. We return to this point when we define reversibil- 
ity for hidden Markov models. 

What about irreversible Markov processes? A simple 
example will suffice. Consider the process that gener- 
ates the periodic sequence . . . ABC ABC ABC .... Note 
that the time-reversed Markov chain differs: the forward 
generator will emit AB but not BA, while the reverse 
generator produces BA but not AB. 

Finally, we comment briefly on the difference between 
a Markov process and its associated Markov chain. The 
Markov process exists in the abstract, describing a mea- 
sure over bi-infinite strings. The Markov chain is a one- 
sided generator representation taking the form of a sin- 
gle matrix. Within this class of representations, each 
stationary and finite Markov process has exactly one 
finite-state Markov chain. Markov processes can also be 
represented in another model class — the hidden Markov 
models — and within that model class, we will see that a 
given Markov process can have multiple presentations. 



Ti{x)T{x,y)T{y,z) 



IV. GENERATORS WITH UNOBSERVABLE 

STATES 

In a similar manner, we now consider models of pro- 
cesses whose states are not directly observable, also 
known as hidden Markov models. Though rather less 
well understood than Markov processes, much progress 
has recently been made; for example, see Ref. [10 . Along 
the way, we highlight differences between hidden Markov 
models and Markov chains — differences that force one to 
consider questions of structure very carefully. 



Definitions 



(a) 



(b) 
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FIG. 2. (a) The Golden Mean Process as a hidden Markov 
model. The internal state set is 7?. = {A, B} and the obser- 
vation alphabet is ,4 = {0, 1}. The transitions between states 
specify p\x where p = ¥{Xo = x,Ri = /3|-Ro ~ q)- (b) Its 
time-reversed hidden Markov model is not the same. It is 
nonunifilar, while the forward presentation is. 



We begin with a Markov chain R0R1R2 . . . over a fi- 
nite state set 7?., the state alphabet. This chain is inter- 
nal to the hidden Markov model. Then, a finite-state 
hidden Markov model (HMM) is a sequence of outputs 
X0X1X2 . . ., each taking values from a finite set A that 
we now call the output alphabet. The output sequence is 
generated by the internal Markov chain through a set of 
transition- output matrices — one matrix for each symbol 
X € A. Each matrix element T^^a, f3) gives the transition 
from (internal) state a to state /3 on generating output 
X e A. That is, 

r,(a,/3) = P(Xo = x,i?i=^|i?o = a) • 

Note that the internal Markov chain's transition matrix 
is the marginal distribution over the output symbol: 

T(a,/3)=^r,(a,/3) 

- P(i?i = /3|i?o = a) . 

If, for each x G A and a G TZ there exists at most one 
/3 G TZ such that Tx(a,i3) > 0, then we say the hidden 
Markov model is unifilar. An equivalent statement is that 
the entropy of the next state, conditioned on the current 
state and symbol, is zero: H[Ri\Ro,Xo] = 0. 

The hidden aspect of a hidden Markov model refers 
to the fact that the internal Markov chain is not di- 
rectly observed — only the sequence of output symbols 
^0X1X2 ... is seen. Note, the process associated with 
a hidden Markov model refers only to the probability 
distribution P(. . . X0X1X2 . . .) over the output symbols 
Xt and not over the joint process {Rt,Xt). 

Non-Markov processes differ from Markov processes 
in that they exhibit arbitrarily long conditional corre- 
lations. That is, the probability of the next symbol may 
depend on the entire history leading up to this symbol. 
Due to this, non-Markov processes cannot be represented 



by finite-state Markov chains. One signature of (and mo- 
tivation for) hidden Markov models is that they can rep- 
resent many non-Markov processes finitely. So, when- 
ever a process (Markov or not) has a finite-state hidden 
Markov model presentation, then we say that the process 
is finite. 

There are a number of hidden Markov model variants. 
One common variant is a state-emitting hidden Markov 
model [11] . Another variant is an edge-emitting hid- 
den Markov model. State-emitting hidden Markov mod- 
els output symbols during state visitations, while edge- 
emitting hidden Markov models output symbols on the 
transitions between states. The two variants are equiva- 
lent 1:4. in that they represent the same class of processes 
finitely. In the following, we always refer to the edge- 
emitting variant. 

As before, we restrict our attention to hidden Markov 
models whose underlying Markov chain is irreducible. 
Thus, a hidden Markov model has a unique stationary 
distribution vr satisfying tt = tt ^ Tj, = ttT and, for 
a G 7^, 7r(a) represents the stationary probability of be- 
ing in internal state a. 

For comparison. Fig. l2ja) displays a hidden Markov 
model for the Golden Mean Process. The internal state 
set \sTZ= {A, _B} and the output alphabet is ^ = {0, 1}. 
The transitions between the states sport the labels p\x, 
where p = Tx(a, 13). 

The probability of any word is calculated as: 

V{Xq^xo,...,Xl-i^xl-i) 

^ ^T^iPQ)TxoiPO, Pi) ■ ■ ■ Txi^_Apl-i, Pl)- 

In matrix form, with T^ = T^a ■ ■ ■ T^l-i 1 '^^ have 

P{Xo:L = w)= ttT^I, 
where 1 = (II... II)*. 



The states TZ and observations A were synonymous in 
Markov chains. The consequence of this was that every 
finite Markov process was uniquely characterized by its 
transition matrix T. With hidden Markov models, this 
is no longer true. A given process, even a Markov pro- 
cess, is not uniquely characterized by a set of transition 
matrices {Tx\- To drive this point home, Sec. Iv] pro- 
vides an example process that has an uncountable num- 
ber of presentations on a fixed, finite number of states. 
This demonstrates the need for a canonical representa- 
tion, which is also introduced in Sec. W\ 



B. Reversibility 

In comparison to Markov chains, the literature on 
reversibility for hidden Markov models is substantially 
smaller and not nearly as detailed — see, for example, 
Ref. il2j. 

Reversibility for Markov processes was defined, in 
Eq. (IT]), such that the probability of every word equaled 
the probability of the reversed word. We take this as 
a general definition, applicable even to non-Markov pro- 
cesses. Thus, a process is reversible if and only if for all 
w & A^ and all L G N, we have: 



P(Xo:L=w)=P(Xo:L=55), 



(5) 



where, as before, w is the reversal of w. 

Detailed balance plays a central role in Markov chains 
and their applications. The analogous local-equilibrium 
property for hidden Markov models is more subtle and in- 
teresting. We define detailed balance for a hidden Markov 
model to mean that the following must hold for all a; G ,4 
and all a, /3 G 7?.: 



7r(a)T,(a,/3)=^(/3)T,(/3,a) 



(6) 



Trivially, if a hidden Markov model is in detailed bal- 
ance, then its internal Markov chain must also be in 
detailed balance. The converse, however, is not true. 
Also, whenever a hidden Markov model is in detailed 
balance, one can show that the process it generates is 
reversible. But unlike the Markov chain case, detailed 
balance is not equivalent to reversibility. And, quite gen- 
erally, the process generated by a hidden Markov model 
can be reversible even if the model is not in detailed bal- 
ance [IH] . The Golden Mean Process of Fig. l2ja) gener- 
ates a reversible process, but it is not in detailed balance. 
The contrapositive is perhaps more intriguing: Every ir- 
reversible stationary process generated by a finite-state, 
edge- emitting hidden Markov model is not in detailed bal- 
ance pH] . 

We can use the condition of detailed balance to in- 



spire a definition for the time-reversed hidden Markov 
model. If T^ are the labeled transition matrices of a 
hidden Markov model and n is its unique stationary dis- 
tribution, then its time-reversed hidden Markov model 
has labeled transition matrices given by: 



T.(a,/3) 



iT{a) 



P\Ri 



(7) 



The time-reversed HMM for the Golden Mean Process is 
given in Fig. ^b), which is now nonunifilar. 

As before, if tt is stationary for T = ^T^., then it is 
also stationary for T = ^T^,. To justify its designation 
as the time-reversed hidden Markov model, we demon- 
strate that it does indeed generate words in reverse time 
and, thus, generates the time-reversed process: 

= X! ^iPo)Txo (Po, Pi) ■ ■ • T^l-Apl-1, Pl) 

P0,---;PL 

^V{Xo^xl-i,...,Xl-i=xo) . 

This result provides an alternative characterization of re- 
versibility which parallels that for Markov chains given 
in Eq. Q. That is, a hidden Markov model is reversible 
if and only if for all w G A'" and all i G N, we have: 



V{Xo.,L=w)^V{Xa.,L^w), 



(8) 



indicating that the two hidden Markov models agree on 
the probability of every word; cf. Eq. ^. Also, note 
that if the hidden Markov model is in detailed balance, 
then it equals the time-reversed hidden Markov model: 
Tx = Tx- We see that detailed balance is a structurally 
restrictive property. 

For Markov chains, determining if a process is re- 
versible amounted to checking for detailed balance. The 
situation is more complicated for hidden Markov mod- 
els but, curiously enough, there exists a straightforward 
procedure to check if two hidden Markov models generate 
the same process language. This is known as the identi- 
fiability problem [TS] , and its solution [H [Tni [TT] , though 
20 years old now, does not seem to be as well known. 
A crude test is to verify that the hidden Markov model 
and its time-reversed hidden Markov model agree on the 
probabilities of every word of length L, where L < 2\TZ\ 
and \TZ\ is the number of states in the model [Ij. 

Another interesting question is whether or not the re- 
versibility of the internal Markov chain has any effect 
on the reversibility of the observed process. As it turns 



out, the answer is no. Jumping ahead a bit, we note 
that the forward e-machine in Fig. |9]has a reversible in- 
ternal Markov chain, but the observed process is irre- 
versible. Additionally, to any irreversible Markov chain, 
we can simply assign the same symbol on each outgoing 
edge. This creates a period- 1 process that is definitely 
reversible. So, the reversibility of the internal Markov 
chain can make no statement on the reversibility of the 
observed process. 



2z I 



C0C^©O¥i 



2z 1-^ 



FIG. 3. The Golden Mean Process as a continuously 
parametrized hidden Markov model. The internal state set 
is 7?. = {A, _B} and the observation alphabet is ^ = {0,1}. 
Each value of z = 1P(B, 0|j4) £ [|, 1] defines a unique hidden 
Markov model that generates the same process as the models 
in Figs.flja) andlSFa). 



STRUCTURE AND CANONICAL 
PRESENTATIONS 



A. Decomposing the State 



Rarely does one work directly with a process. Need- 
less to say, specifying the probability of every word at 
every length is a cumbersome representation. Instead, 
one works with generators. However, one must be careful 
in choosing a representation for the latter. For example, 
the class of processes representable by finite-state hidden 
Markov models is strictly larger than the class of pro- 
cesses representable by finite-state Markov chains p^ . 
So, one cannot use Markov chain presentations in many 
cases. 

As previously noted, when the process can be repre- 
sented by a finite-state Markov chain, then that presen- 
tation is unique. If the process has no finite-state Markov 
chain representation, however, then there is a challeng- 
ing multiplicity of possible hidden Markov model pre- 
sentations to choose from, many with distinct structural 
properties. As an example. Fig. [3] gives a continuously 
parametrized set hidden Markov models for the Golden 
Mean Process. Each value of z = P(B,0|A) G [i, 1] de- 
fines a unique hidden Markov model that generates the 
same Golden Mean Process. That is, P(Xi = fi\XQ = a) 
is independent of z and equal to the matrix T(a, jS) 
that defined the Markov chain in Fig. [11 a). Note that 
this is only a two-state hidden Markov model. It is 
possible to construct similar families with even more 
states. (The technique for constructing such continu- 
ously parametrized presentations for a given process will 
appear elsewhere.) 

This degeneracy serves to emphasize why a process's 
structure and that of its presentations deserve close at- 
tention. To appreciate this concern more deeply, we de- 
tour and examine structure explicitly. Then, we intro- 
duce e-machines and show how they provide a canonical 
presentation that, in addition to other benefits, resolves 
the degeneracy. Finally, we discuss additional notions of 
reversibility that are more closely tied to and calculable 
from e-machines. 



Reference [3] presented an information-theoretic anal- 
ysis of the relationship between a hidden Markov model's 
states and the process it generates. One of the main con- 
clusions was that the internal-state uncertainty i?[i?o] 
can be decomposed into four independent components. 
Here, we summarize the decomposition, assuming a min- 
imal amount of information theory. Reference |19j should 
be consulted for background not covered here. Famil- 
iarity with the block entropy, entropy rate, and excess 
entropy as developed in Ref. |3] is also assumed. 

By splitting a process's bi-infinite sequence of random 
variables into a past X-x, and a future Xq:, we isolate the 
information that passes through the 'present state i?o- 
As developed in Refs. |20j and j21j, the statistical re- 
lationships among these three (aggregate) variables are 
concisely expressed using the information diagram tech- 
nique of Refs. j22] and [23]. Said briefly, a process's 
Shannon entropies and mutual informations [23] form 
a measure over the associated event (sequence) spaces. 
Given this, the set-theoretic relationships between the 
measure's atoms are displayed in the Venn-like diagram. 

For a three-variable information diagram, we have 
three circles representing iJ[Xo], -ff[A'o:], and i/[i?o]- In 
total, this means that there arc 7 atoms to consider. 
However, since every hidden Markov model has an in- 
ternal Markov chain that governs generation, the past 
and future are shielded from each other given the cur- 
rent state. This is a probabilistic statement, but when 
phrased in terms of conditional mutual information, we 
have /[Xo;Xo:|i?o] = 0. A moment's reflection shows 
that this is a way of saying that the hidden Markov model 
generates the process. This quantity can be nonzero only 
if we compare a process to the states of a hidden Markov 
model that generates a different process. 

The information diagram is shown in Fig. [4] There, 
-ff [A':o] is represented by everything contained in the or- 
ange circle [21]. The purple circle represents i/[A'o:] and 
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FIG. 4. Information diagram capturing all possible relation- 
ships between the past X-q, the future Xq:, and the present — 
the current internal state J?o- The statistical complexity C^j, 
excess entropy E, crypticity Xi oracular information C,, and 
gauge information ip appear as unions and intersections of 
the resulting atoms. 



the black circle, our focus, represents state information 
H[Ro]. The figure contains an additional blue circle 
that can be ignored until e-machines are introduced in 
Sec. |VB| So, absent the blue circle, we see that the state 
information decomposes into four quantities. Specifically, 



H[Ro] =E + x + C + ^, 



(9) 



where we have the: 



1. Excess entropy: E == /[Xq; ^o:]j 

2. Crypticity: x ^ I[X:q; RqIXq.], 

3. Oracular information: ( = /[i?o; ^0:|^:o]j ^^nd 

4. Gauge information: ip — H[Rq\X.q, Xq.]. 

Excess entropy is a by-now standard measure of com- 
plexity [26-30 that captures the shared information 
between past and future observations. Crypticity is 
a relatively new measure of structure introduced in 
Refs. Pni inn ED . By comparing to the apparent infor- 
mation that excess entropy measures, crypticity moni- 
tors how much of the internal state information is hid- 
den. Oracular information^ introduced in Ref. [3;, mea- 
sures how much information a presentation provides that 
can improve predictability, but that is not available from 
the past. Finally, gauge information, also introduced in 
Ref. [3 , quantifies how much additional structural infor- 
mation exists in a presentation that is not "justified" by 
the past or the future. Taken together these quantities 
provide an informational basis useful for analyzing the 
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FIG. 5. Decomposition of the state information H[R] con- 
tained in the parametrized Golden Mean Process presenta- 
tion family of Fig. [I As a function of z = ¥{B,0\A) G [i,l], 
the excess entropy 15, crypticity x, oracular information (", 
and gauge information ip are stacked such that the top of the 
curve is their sum H[R], the state entropy of the presentation 
for the given value of z. The miniaturized information dia- 
grams are special cases of Fig. |4l tailored to 2;- values. From 
left-to-right, we have z — ^, z =|, and z = 1. 



various kinds of structure a process or a process's pre- 
sentation contains. 

To see this, we can apply these structural complexity 
measures to the Golden Mean Process presentation fam- 
ily of Fig. [3] For each value of z = P{B,0\A) e [5, 1], 
Fig. [5] plots E, Xj C) S'lid (fi stacked in way so that their 
sum H[Ro] is the top curve. One immediately sees that 
E is independent of z. This is as it should be since E is 
a function only of the observed process and, by construc- 
tion, the parametrized presentation always generates the 
Golden Mean Process. All of the other measures change 
as the presentation changes, however. Let's explore what 
they tell us. 

Beginning with z ~ 1/2, we recover the Markov chain 
presentation of Fig. l2ja). In this presentation, all of the 
state information H[Rq\ is contained within H[X.q]. This 
is represented by the leftmost information diagram at the 
top of Fig. [5j Loosely, we say that the state information 
contains only information from the past. However, one 
must keep in mind that the presentation still captures E 
bits of information, and this information is shared with 
the future. The gauge and oracular informations vanish. 
It turns out that the z = 1/2 presentation is the process's 
forward e-machine, but more on this later. 

As z increases, so do the gauge and oracular informa- 
tions. With this change, the information diagram circle 
for H[Rq] straddles iJ[Xo] and H[Xo:], as shown in the 
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central information diagram atop Fig. [5] This indicates 
that the state information now consists of historical in- 
formation, oracular information, and also gauge informa- 
tion. For all values of z, the overlap that H[Ro] has with 
the intersection of the past and future is constant. This 
is because each presentation generates the process and so 
each must capture E bits of shared information. 

Finally when z = 1, the circle for -ff [i?o] is now com- 
pletely contained inside the future H[Xo:]- Now, the 
information diagram resembles the right-most one atop 
Fig. [5J There is no crypticity, no gauge information, but 
there is oracular information. The interpretation is that 
the state information, apart from E, consists only of in- 
formation from the future. As we will see, the z = 1 
presentation corresponds to the time-reversed presenta- 
tion of the reverse e-machine. And, since the Golden 
Mean Process is a reversible Markov chain, the z = 1 
information diagram mirrors the diagram for z ~ i^- 



B. e-Machines 

We discussed processes in the context of generators, as 
represented by Markov chains and hidden Markov mod- 
els, but another important aspect concerns prediction. 
As we will show, e-machines are a natural consequence of 
this perspective, and they provide a much richer analysis 
of irreversibility. Additionally, their uniqueness provides 
a solution to the multiplicity of HMM presentations. 

Consider again a process's output sequence and, now, 
interpret time as increasing with the index. The result is 
a time-series . . . Xt^iXtXt+i . . .. Our goal is to construct 
a model that predicts future observations. Specifically, 
we want to find sufficient statistics that preserve our abil- 
ity to predict. Translating this into a concrete procedure, 
we first remove redundancies in the time-series, by group- 
ing histories that lead to the same distribution over fu- 
tures: 



So Si 



S. 



X:0 



X.r 



P(Xo:|Xo = X:o) = P(Xo:|Xo = x',o) 



The grouping defines an equivalence relation over his- 
tories and, thus, partitions the space of histories. This 
partition is the coarsest one that provides optimal pre- 
diction. It is called the process's causal state partition. 
Each equivalence class is known as a causal state and, 
thus, to each causal state, there is a unique distribution 
over futures [5J|311|33]. The set of causal states is denoted 
S. 

Now, consider a semi-infinite history X.q — x-q which, 
by the causal state equivalence relation, induces causal 
state Sq ~ aQ. If we append a new observation, we get 
X:i = a;:oXo which, in turn, induces Si = ai. In this 
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FIG. 6. The dynamic over the causal states is induced by 
the dynamic over the (semi-infinite) histories. For example, 
a history X-\: ending at t = —1, maps to causal state iSo. 
When a new symbol Xq is appended to the old history, we 
induce a new causal state Si. 



sense, there is a natural dynamic over the causal states 
that is induced by the dynamic over the observed se- 
quences. This dynamic is represented in Fig. [6] The 
pair of causal states and transition dynamic is called a 
process's e-machine. 

Generally, the set of causal states can be uncountable, 
countable, or finite; see, for example. Fig. 17 in Ref. [8]. 
Even when the set is not finite, the set that is visited in- 
finitely often may be finite. The infinitely visited subset 
defines the recurrent causal states. All other states are 
transient causal states and not the subject of our discus- 
sion here. Now, when the set of recurrent causal states is 
finite, then the e-machine — obtained by partitioning his- 
tories for the purposes of prediction — is representable as 
a finite-state unifilar hidden Markov model. We denote 
the transition matrices in the same way, except that we 
use S as the state random variables, which take on values 
from S: 

T,{a,l3) = P(Xo ^x,Si^ l3\So = a) . 

e-Machines with a finite number of recurrent states gen- 
erate a subset of the finitary processes — processes with 
finite excess entropy. This subset represents a strictly 
larger set of processes than finite-state Markov chains 
since it includes processes with measures over strictly 
sofic |Mj shifts. 

The e-machine is the unique presentation in the class 
of unifilar hidden Markov models ,32j.33j and, thus, it de- 
fines a canonical presentation for a given process. There 
are other benefits. For one, e-machine unifilarity allows 
one to directly calculate the process entropy rate. Early 
on. Shannon pointed out that this is always possible to 
do with Markov chains. It was soon discovered that it 
is not possible using nonunifilar hidden Markov models 
|15| . Nonunifilarity makes each presentation state appear 
more random than it actually is. For a more detailed 
treatment of e-machines, see Ref. [32j. 

We pause briefiy to point out that the unifilarity prop- 
erty of the e-machine is a consequence of the equivalence 
relation. It has been known for some time [H [21] [35] that 
there are nonunifilar hidden Markov models of processes 



11 



that can be smaller, sometimes substantially smaller, 
than the process's e-machine. However, finding a canon- 
ical presentation within the class of nonunifilar hidden 
Markov models is a task that has evaded solution. One 
obvious choice is to focus on the hidden Markov model 
that minimizes the state entropy; see Ref. j35| for fur- 
ther discussion. Since our goal is to analyze the role that 
structure plays in irreversibility, having a canonical rep- 
resentation is essential. So, our focus on e-machines is 
based, in part, on practicality since one can calculate the 
e-machine from any alternative presentation. It is also 
theoretically useful since many quantities — such as the 
process entropy rate and excess entropy — are not exactly 
calculable from nonunifilar presentations. Additionally, 
the states of nonunifilar presentations are not sufficient 
statistics for the histories. The consequence is that one 
cannot forget the past and work with an individual state 
in a general hidden Markov model — instead, one must 
work with a distribution over the states |36j . 

Our discussion of processes began by pointing out 
that time is merely an interpretation of the indices 
on a set of random variables. Thus far, we described 
e-machines from the forward perspective — yielding the 
forward e-machine, denoted M+. Similarly, following 
Refs. [201 mi one can partition futures for the purposes 
of retrodiction, and this partitioning induces a dynamic 
over the reverse causal states. The resulting unifilar hid- 
den Markov model is known as the reverse e-machine, 
denoted M^. To differentiate the states in each hidden 
Markov model, we let S^ represent the random variables 
for the forward causal states and use S^ for the reverse 
causal states. The equivalence relations used during par- 
titioning, ^'^ and ^~ , are generally distinct. We use 
e+ : X — > 5+ to denote the mapping that takes a history 
and returns the forward causal state into which the his- 
tory was partitioned. Similarly, we use e~ : x —>■ S~ to 
denote the mapping from futures to reverse causal states. 

To orient ourselves. Fig. [7] places the relevant random 
variables on a lattice. The X variables denote the ob- 
served process of the hidden Markov model, which is 
broken up into the past (orange) and future (purple) ob- 
servation sequences. The hidden causal states are repre- 
sented by the S variables. In the present, we have S^ 
and Sq straddling the past and future. If one scans the 
observed variables in the positive direction — seeing X_3, 
X_2, and X_i — then that history takes one to causal 
state Sq . Analogously, if one scans in the reverse di- 
rection, then the succession of variables X2 , Xi , and Xq 
leads to Sq . 

Summarizing, we represent each e-machine as a com- 
muting diagram that operates on the hidden process lat- 
tice, using X and a to represent symbol and causal state 
realizations, respectively: 



Past 



Present Future 



S_2 S_j Sq 

X-3 X-2 X-i 



Si S2 

Xa Xi X2 



SZi S^ S^ 



FIG. 7. Hidden Process Lattice: The X variables denote the 
observed process; the 5 variables, the hidden causal states. 
If one scans the observed variables in the positive direction — 
seeing X-3, X-2, and X-i — then that history takes one to 
causal state Sq . Analogously, if one scans in the reverse di- 
rection, then the succession of variables X2 , X^ , and Xq leads 
to Sq . The colors indicate which variables participate in the 



information measures of Fig. IS] 




H[X.,o] 



H[Xo 

c- 



FIG. 8. Information diagram for the forward and reverse 
e-machines. 





For the forward e-machine M^, every past x-q maps to 
a unique next past X;i on symbol xq. By the forward- 
looking map e"*", each past x-q corresponds to unique 
causal state CTq . This many-to-one correspondence in- 
duces a dynamic on the causal states such that ctq tran- 
sitions to (Ji on symbol xq. Similarly, for the reverse 
e-machine M~ , every future Xi- maps to a unique next 
future xq: on symbol Xq. The reverse-looking map e~ as- 
sociates Xi- with a^ . The many-to-one correspondence 
induces a dynamic on the reverse causal states such that 
a^ transitions to a^ on symbol Xq. 

Finally, we gather the forward and reverse e-machines 
in Fig.|8] Together, they provide complementary views of 
the process. For example, the minimal amount of infor- 



12 



mation one must store in order to generate the process in 
the forward direction defines the forward statistical com- 
plexity C'^ = H[S^]. This information, in general, is not 
equal to the minimal amount of information one requires 
for retrodiction C" = H[S-] [Ml HIl E] • Notably, the 
e-machine has no gauge information since it is minimal 
and, also, no oracular information since it is unifilar. Re- 
ferring briefly back to Figs. 4 and 5 when z = -^i '^6 
have the forward e-machine. When z = 1, we have the 
time- reversed presentation of the reverse e-machine M~ . 
The interpretation is direct: The crypticity x~ oi the 
reverse e-machine becomes oracular information ^ in the 
time-reversed presentation. 

Our preference, from now on, is to use the forward and 
reverse e-machines. Given the forward e-machine M"*", we 
can construct, via Eq. ([7]), a reverse generator of the pro- 
cess Af+. However, this is just one presentation among 
many possible reverse generators of the process. So, we 
operate on that reverse generator, using techniques from 
Ref. [21 , and obtain the reverse e-machine M^ . To- 
gether, the forward and reverse e-machines serve as the 
basis for understanding processes through the use of gen- 
erators. In Sec. |VII[ we unify the two e-machines into a 
single machine and discuss its meaning in the context of 
the decomposition of state information. 



C. Finite State Automata 

One interesting property of e-machines, and hidden 
Markov models in general, is that they are intimately re- 
lated to automata in formal language theory [37]. Here, 
we briefly review their relationship. 

Given a process V, we can examine the set of all words 
that occur with positive probability. This set is known as 
the support of the process's stochastic language. Strip- 
ping away the transition probabilities of any finite-state 
hidden Markov model leaves a finite-state automaton 
that generates the support of the process language. So, 
we see that the support of a process generated by finite- 
state HMM always corresponds to a regular language. If 
the hidden Markov model was unifilar, then the result- 
ing structure, without probabilities, is equivalent to a 
deterministic finite automata (DFA) . Similarly, nonunifl- 
lar hidden Markov models map to nondeterministic flnite 
automata (NFA). 

However, it is necessary to point out that there are 
quite drastic differences between formal and process lan- 
guages. While DFAs and NFAs are equivalent in the set 
of formal languages that each can represent using a flnite 
number of states, the same is not true of hidden Markov 
models. In fact, there are finite-state nonunifilar HMMs 
that have no corresponding finite-state unifilar counter- 



part. One well known example is the Simple Nondeter- 
ministic Source of Ref. [5] . It can be represented as a two- 
state nonunifilar HMM, but its e-machine — the smallest 
unifilar HMM generating the same process — requires a 
countably infinite number of states. 

Since it will be useful to compare topological proper- 
ties to statistical properties, we define M^ and AI^ as 
the deterministic finite-state automata corresponding to 
the forward and reverse e-machines with all probabili- 
ties removed. Note that these DFAs need not be the 
minimal deterministic finite-state automata [37] gener- 
ating the support, and this fact highlights the differ- 
ence between the causal-state equivalence relation and 
the Nerode state-equivalence relation in formal language 
theory. If we subsequently minimize M7' and M^ , we are 
left with the minimal and unique DFAs that generate the 
support, respectively denoted D'^ and D~ . 

Also, we mention that there is a large body of liter- 
ature in formal language theory concerning /c-reversible 
languages [3M42] . This topic does not relate directly to 
our notion of reversibility and is rather closer to address- 
ing a process's Markov order; cf. Ref. [43j. 

One can view e-machines as probabilistic counterparts 
to DFAs. In fact, the relation between formal language 
theory and stochastic languages can be extended. Just 
as there is a hierarchy of models in formal language the- 
ory, one can consider a hierarchy of stochastic models as 
well. See, for example, the process hierarchy proposed in 
Ref. ig. 



D. Reversibility Revisited 

By focusing on e-machines, we side-step the represen- 
tational degeneracy of hidden Markov models. Recalling 
the uniqueness of the forward and reverse e-machines, 
we note that properties of the e-machine can also be in- 
terpreted as properties of the process. This also allows 
us to consider additional measures of reversibility that 
are based on structural properties of the e-machine. So, 
while each of the forthcoming definitions can be stated 
strictly in terms of the process's probability distribution, 
we prefer to use equivalent definitions in terms of the 
forward and reverse e-machines. This is akin to studying 
formal languages through the use of the minimal DFAs. 

As Ref. [1] demonstrated, there is a finite procedure 
for determining whether two finite-state hidden Markov 
models generate the same process language. By Eq. (|8|, 
this technique also provides a method for determining 
whether a process is reversible or not. An alternate tech- 
nique involves the forward and reverse e-machines. With 
them, one simply asks if the two machines are identi- 
cal to each other. If so, then the process is reversible. 
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In Ref. |21| . this property was termed microscopic re- 
versibility and we write: M^ — M~ . 

We can also consider several weaker forms of reversibil- 
ity. For example, as we noted, the process that repeats 
ABC . . . indefinitely is not reversible, but the e-machines 
are essentially the same in that the amount of informa- 
tion one requires for prediction equals the amount re- 
quired for retrodiction. Following Ref. |21j . a process is 
causally reversible if and only if C^ = C~ . 

In terms of topology, we say that a process is sup- 
port reversible if and only if D+ = D^ , where equality 
means that DFAs must be identical under an isomor- 
phism over the states. Finally, we also consider symbol 
isomorphisms. If there exists an isomorphism from the 
output alphabet of M+ to the output alphabet of M^ 
that renders the two machines equal, then we say that the 
process is reversible under symbol isomorphism, denoted 
M+ = M^ . Similarly, the process is support reversible 
under symbol isomorphism if and only if Z3+ = D^ . 



VI. EXAMPLES 

This section exercises the preceding theory, giving 
a number of additional results and illustrating them 
through example processes and presentations. We start 
with an exploration of which kinds of reversibility there 
can be. Then we analyze in detail two example ir- 
reversible processes, one with a rather counterintuitive 
property. The analyses give a concrete understanding 
of how irreversibility arises and what its structural con- 
sequences are for a process. The section closes with a 
survey that demonstrates the dominance of irreversibil- 
ity among processes. 



A. Causal Reversibility Roadmap 

Given these various notions of reversibility, a natural 
question comes to mind: What combinations are possi- 
ble? To this end, we state a number of straightforward 
relationships: 



the four properties (i) M+ = Af", (ii) Af+ ^ Af", 
(iii) D+ = D-, and (iv) D+ = D- . Of the 16 possi- 
ble Boolean- vector combinations, only 6 are possible due 
Eqs. (fTol) - ([III. 



Af + == M- 


=> 


ct = c- 


(10) 


M+ = M- 


=^ 


D+ = D- 


(11) 


M+ = M- 


=> 


M+ = M- 


(12) 


M+ = M- 


=i> 


D+ = D- 


(13) 


D+ ^ D- 


=> 


D+ = D- 


(14) 



Now, let us restrict attention to just the causally re- 
versible processes (C^ = C^) and examine microscopic 
and support reversibility, with and without symbol iso- 
morphisms. That is, we consider the combinations of 



Table ^ gives example forward and reverse e-machine 
pairs for each of the 6 possibilities. What we learn from 
these examples is that causal reversibility indeed captures 
a larger class of processes than microscopic reversibility. 
However, it also captures a bit more, including processes 
that are not isomorphic to one another under a sym- 
bol isomorphism. The table also demonstrates that irre- 
versibility is not only a topological concern — the forward 
and reverse DFAs can be identical while the generated 
process languages are not. 



B. Causal Irreversibility 

Irreversible processes are ubiquitous, even among those 
represented by finite-state e-machines. In our first ex- 
ample, we ground intuitions with a process whose irre- 
versibility is driven topologically. The example is par- 
ticularly illustrative since its e-machines have a finite 
number of causal states. In the second example, we ex- 
amine an irreversible process whose forward and reverse 
DFAs are identical; this demonstrates that irreversibil- 
ity can arise purely probabilistically. Then, in the third 
example, we see the extent to which probability aggra- 
vates irreversibility when it causes a finite-state forward 
e-machine to become an infinite-state reverse e-machine. 



1. Support- Driven Irreversibility 

The first example we consider shows that a process can 
have different, but finite, numbers of forward and reverse 
causal states. Formally, Ref. [3T] provides the technique 
for calculating the reverse e-machine via operations on 
the graph structure of the forward e-machine but, for 
pedagogical reasons, both the forward and reverse causal 
states are constructed in terms of Xt only [H] . 

Consider the time series over the alphabet {0,1,2} 
whose forward (Af"*") and reverse (Af~) e-machines are 
shown in Fig. [9j As we will show, the process language 
generated by A/+ is irreversible and, additionally, this 
irreversibility is due to an underlying topological irre- 
versibility That is, D+ ^ D' implies that A/+ ^ Af". 

To see the topological irreversibility note that in Af + 
ui = 01 is a valid word: Start in A, see and stay in A, 
then see 1 and go to f3. However, w = 10 is not a valid 
word. We can also see this in a slightly different light by 
noting that w is valid in Af^, but not valid in Af~. 

To understand the forward causal states, consider the 
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TABLE I. Diversity of causally reversible processes (C^ — C^ ): Example presentations for forward and reverse e-niachine 
pairs, with the same number of states, for the 6 possible combinations; all other combinations are impossible. 



distribution of Xq 
tory suffixes: 



(0, 1, 2) conditioned on length-1 his- 



P(Xo|X_i=0) = (1/2, 1/2,0), 
P(Xo|X_i = l) = (0,1/2, 1/2), and 
P(Xo|X_i = 2) = (l/2,l/2,0). 

We see that the time series generated by this machine has 
the following characteristics: Every history that ends on 
symbol or 2, is followed by either or 1, with probabil- 
ity 1/2, but never by symbol 2. Hence, with regard to the 
distribution of a one-step future, all histories ending on 
or 2 are equivalent and we denote this class of equivalent 
histories as causal state A. The distribution of symbols 
following words ending on symbol 1 is different. They are 



followed by either symbols 1 or 2 with probability 1/2, 
but never by symbol 0. All histories ending in 1 are hence 
equivalent with respect to the distribution of a one-step 
future and we denote their equivalence class as state B. 
States A and B partition of the entire space of al- 
lowable histories. The fact that the equivalence class of 
a history is determined solely by the last symbol is re- 
flected by the time series of symbols having Markov order 
1. The reader should verify that, in this particular exam- 
ple, Markovity also means that the partition obtained by 
examining one-step futures is equivalent to the partition 
obtained by examining arbitrary L-step futures. From 
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FIG. 9. The forward (top left) M^ and reverse (bottom left) M~ e-machines for a causally irreversible process. Note that 
D"*" 7^ D~ and, thus, Af"*" 7^ M~. The forward causal states 5"*" (top right) partition all allowable histories . . .X_i. In this 
example, the states are uniquely characterized by specifying the most recent symbol X_i. For example, any valid history ending 
with a or 2 maps to state A, and the possible futures can begin with a or 1. The incoming edges of M+ correspond to 
histories (X_i), while its outgoing edges correspond to futures (Xo). The reverse causal states <S~ (bottom right) partition all 
allowable futures Xo.... In this example, the states are uniquely characterized by specifying the earliest symbol of each future. 
For example, any valid future beginning with a maps to state C and the associated histories must end with a or 2. The 
incoming edges of M~ correspond to futures (Xo), while its outgoing edges correspond to histories (X_i). 



this, we see that <S^ — X-o/ ^+ consists of: 

^ = {...0, ...2} and 
B = {..A}, 

where an ellipsis stands for any valid past. 

The partition is represented graphically in the matrix 
at the top-right of Fig. [9j In it, we independently rear- 
ranged the histories and futures so as to cluster the block- 
structures within the matrix. For each history X_i, the 
distribution over futures Xq is (topologically) represented 
as a column. Histories with the same column colorings 
belong to the same equivalence class under the forward 
equivalence relation '^~^. Finally, note that the futures 
are not partitioned by the forward equivalence relation 
since Xq = 1 is allowable from both A and B. 

To understand the reverse causal states, we examine 
the distribution of symbols preceding the future. Since 
the Markov order does not change when analyzing the 
time series in the reverse direction (App. Q, the equiv- 
alence class of a future is determined solely by the first 
symbol of the future. Additionally, equality of distribu- 



tions over length- 1 histories implies equality over arbi- 
trary length-L history distributions. Thus, for X^i = 
(0, 1, 2) conditioned on a length-1 future, we have: 

P(X_i|Xo = 0) = (1/2,0,1/2), 
P(X_i|Xo = 1) = (1/4,1/2,1/4), and 
P(X_i|Xo = 2) = (0,l,0). 

Any word starting with symbol can only be preceded 
by symbols or 2 with probability 1/2 each, but never 
with symbol 1. Correspondingly, all futures starting with 
symbol are equivalent and their equivalence class is de- 
noted as reverse causal state C. Furthermore, any word 
starting with symbol 1 is preceded by symbols or 2 
with probability 1/4 each or is preceded by symbol 1 
with probability 1/2. All futures starting with symbol 1 
are equivalent with respect the distribution of preceding 
symbols and subsumed as reverse causal state D. Fi- 
nally, words starting with symbol 2 can only be preceded 
by symbol 1. The equivalence class of futures starting on 
symbol 2 is denoted reverse causal state E. From this. 



16 



we see that S 



Xo-.l 



consists of: 



C = {0...}, 

D = {!...}, and 

where an elhpsis now stands for any vahd future. 

States C, D, and E partition the space of aUowable 
futures. They are represented in the lower-right matrix 
of Fig. I9] In it, we rearranged the histories and futures 
so as to cluster the block-structures within the matrix. 
For each future Xq, the distribution over histories X^i is 
(topologically) represented as a row. Each row coloring 
is distinct, reflecting the fact that each future belongs to 
a distinct reverse causal state under the reverse equiv- 
alence relation ~~. Finally, note that the histories are 
not partitioned by the reverse equivalence relation since 
X_i = 0, for example, is allowable from both C and D. 

Note how the space of histories is partitioned into 
only two equivalence classes, while the space of futures 
is partitioned into three equivalence classes. Any first- 
order Markov chain on k symbols has at most k causal 
states. That we only have two forward causal states is 
due to the fact that the future distributions after see- 
ing symbols 2 and are equivalent. This equivalence, 
however, does not hold in the reverse direction and, so, 
there are three reverse causal states. The asymmetry 
is further exemplified by the forward e-machine having 
smaller statistical complexity than the reverse e-machine: 
C+ = 1 bit < C~ = 3/2 bit. For this particular process, 
it takes 1/2 bit more memory, on average, to generate 
the same string of symbols from right to left than from 
left to right. 

Comparing the causal states as represented in Fig. |9j 
we see that each equivalence relation also defines a par- 
tition over the set {X-i,Xq). This, in turn, extends to 
a partition over bi-infinite strings. So, we can think of 
the forward (reverse) e-machine as the restriction of this 
partition to the set of histories (futures). The partition 
over the bi-infinitc strings must be such that when it is 
restricted to histories (futures) it induces a unifilar dy- 
namic over equivalence classes. This particular point will 
be important when we discuss the bidirectional machine 

in Sec. rvm 



2. Prohabtlity- Driven Irreversibility 

In our second example, we show that irreversibility can 
have purely probabilistic origins. We do this with an ir- 
reversible, order-2 Markov process that has a reversible 
support. Figure [lO] presents the recurrent components 
of the forward and reverse e-machines, M"*" and M~ . To 
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FIG. 10. The forward (top) M+ and reverse (bottom) M~ 
e-machines of an irreversible, order-2 Markov process. The 
process is irreversible since M+ 7^ M~ . However, the support 
is reversible since the underlying topologies of each e-machine 



are the same: M2 



M- 



see that the support is reversible, note that the e-machine 
structures, without probabilities, are equal: M^ = M^ . 
This implies that D^ = D^ , but it can also be seen 
directly since the topologies, in this example, are al- 
ready minimal deterministic finite automata. The prac- 
tical consequence of having a reversible support is that 
P(w) > if and only if P(w) > 0. 

Beginning with the forward causal states, we exam- 
ine the distribution of symbols that succeed histories. 
Since the process is order-2 Markovian, we calculate fi- 
nite histories and futures instead of semi-infinite histo- 
ries and futures. Specifically, partitioning length-2 his- 
tories based on the conditional distributions of length- 
2 futures yields the same result as partitioning semi- 
infinite histories based on the conditional distributions 
of arbitrary length futures |1S]. We directly calculate 
P(Xo,Xi|X_27-^-i) as a right-stochastic matrix, find- 
ing: 

P(Xo,Xi|X_2,^-l) 
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The forward causal states are groupings of histories and, 
in this presentation, they correspond to groupings of 
identical rows. For example, the rows corresponding to 
01 and 21 are identical and, so, are grouped into the same 
equivalence class. Translating these history suffixes back 
into semi-infinite histories, we ffiid that S = X-q/ ~^ 
consists of: 

A = {...02, ...20}, 
B = {...01, ...21}, 
C = {...10}, and 
i^ = {...12}. 

The reverse causal states are similarly obtained, but 
now we consider the distribution of symbols that precede 
futures. Once again, wc work with finite-length histories 
and futures. Using a right-stochastic matrix, we calculate 
P(X_2,^-i|^o,^i) directly as: 

01 02 10 12 20 21 

01 /O 1 o\ 

02 1/4 3/4 
_ 10 1/2 1/4 1/4 

12 1/2 1/4 1/4 

20 1/4 3/4 

21 \1/12 2/3 1/4/ 

The reverse causal states are groupings of futures, and 
this corresponds to groupings of identical rows in the 
matrix. Translating these future prefixes into semi- 
infinite futures, we find that the reverse causal states 
S~ — Xq-J ^^ consist of: 

i? = {02..., 20...}, 
F = {10..., 12...}, 
G = {01...}, and 
H = {2\...}. 

Since there are multiple perspectives involved, we de- 
tour briefly to translate the matrix P(X_2,X_i|Xo,Xi) 
onto the reverse e-machine shown in Fig. |10| One per- 
spective, the global perspective, is the process lattice of 
Fig. [7] that defines forward as a left-to-right movement 
and reverse as a right-to-left movement. The other per- 
spective, the local perspective, is from the e-machine's 
vantage point that is concerned only with its own local 
time. That is, the causal-state dynamic always proceeds 
"forward" in time, irrespective of how forward is defined 
in the global perspective. For the reverse e-machine, this 



means its outgoing transitions translate to right-to-left 
movements on the lattice. To demonstrate, consider the 
element: 

P(X_2 = 2,X_i = l\Xo = 2,Xi = 0) = 3/4. 

The joint word is a;_22;-ia;oa;i — 2120. To verify that 
this is a valid word in the process, one scans the word 
from right-to-left following transitions on M~ . Focusing 
only on xoXi — 20, if we begin in reverse causal state -F, 
then we transition to state G on symbol and, finally, to 
state E on symbol 2. This is precisely the statement of 
the reverse causal-state partition: any future beginning 
with 20 leads (when scanned from right-to-left) to reverse 
causal state E. Continuing from E', we see a:;_2a;_i = 21 
first by transitioning to state F on symbol 1 and then 
again to state H on symbol 2. The total probability of 
this conditional path is 3/4. 

To understand where the irreversibility arises, we first 
note that the matrix P(Xo, Xi|X_2, -'^-i) implicitly con- 
tains the information about the dynamic over the forward 
causal states. For example, from x_2a;-i = 01, we can 
see xqXi = 20 and xqXi = 21 each with probability 1/4. 
Marginalizing and using the forward causal-state parti- 
tion, this means that state B = e+(. . . 01) can see symbol 
2 with probability 1/2 and when it does, we transition to 
state i:» = e+(...012) =e+(. ..12). 

Our goal is to understand why the edge from F to H 
on symbol 2 occurs with probability 3/4 instead of prob- 
ability 1/2 01]. From the reverse causal-state partition, 
any future beginning with XqXi = 10 will lead into state 
F = e~ (10 . . .). If we then see x_i —2, we move to state 
H = £-(210. . .). In the matrix for P(X_2,^-i|^o,^i), 
we now look at the row labeled 10. There, the columns 
labeled 02 and 12 correspond to histories with x-i = 2. 
The probabilities are 1/2 and 1/4, respectively, which 
sum to 3/4. So, indeed, the process is irreversible, de- 
spite having a reversible support. 



3. Explosive Irreversibility 

Our final example shows that, although a process can 
be represented by a finite number of causal states in one 
direction, its presentation in the reverse direction may 
require a countably infinite number of states. The sup- 
port of this process language corresponds to a strictly 
sofic shift [34^ and, thus, the process is not Markovian. 
The consequence is that we must use a hidden Markov 
model representation if we want to represent it finitely, 
at least in the forward direction |47| . The recurrent com- 
ponents of the forward and reverse e-machines are shown 
in Fig. [11] 
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FIG. 11. Explosive irreversibility: Despite the forward e-machine M^ (left) having just two recurrent causal states, the reverse 
e-machine M~ (right) has a countable infinity of recurrent causal states. Transitions for M~ make use of: Un = 2""'"^(3«n)~^, 
bn = 1 — {a„ + c„), c„ = 3"{2zn)~^ , and z„ — 2" + 3" . The dashed state labeled A^ is an elusive causal state [3] ; it is infinitely 
preceded, but neither reachable (from the omitted start state) nor recurrent. See App. [b] 



Let us again study the distribution of symbols suc- 
ceeding histories. Since the process is not Markovian, we 
cannot expect to obtain the causal states by examining 
finite-length histories. And so, we must focus attention 
on semi-infinite histories and their suffixes. 

The presence of synchronizing words |43j makes the 
analysis a bit easier. In this example, w = and w = 2 
are minimal synchronizing words and, so, after observ- 
ing one of these words, the state of the e-machine is 
known with certainty. e-Machine unifilarity then guar- 
antees that on each next symbol we will still know the 
state of the machine. This allows us to read the distribu- 
tion of Xq directly off the forward e-machine's outgoing 
edges. 

Thus, any history ending in symbol will be followed 
by symbols 0, 1, or 2 with probability 1/3 each. The 
equivalence class of histories containing X.q = . . . will 
be denoted forward causal state A+ . Looking at the ma- 
chine, we see that the distribution of next symbols re- 
mains unchanged whenever we see a 1 from state A~^ . 
So, any history ending in a followed by an arbitrary, 
but finite number of Is also belongs to equivalence class 
A^ . Similarly, any history ending with 2 will be followed 
by symbols 1 or with probability 1/2 each. The equiva- 
lence class of histories ending in 2 will be denoted forward 
causal state B^ and, from the machine, we can also see 
that B^ includes any history ending with a 2 followed by 
an arbitrary, but finite number of Is. A history consist- 
ing entirely of the symbol 1 is best understood by taking 
the limit of finite histories which also consist entirely of 
Is. When one does this, the history will be followed by 



symbols or 1 with probability 1/2 each. Concretely, for 
Xo = (0,1,2) and fc > 0, the conditional distributions 
for every valid history are: 

P(Xo|Xo = •••01") = (1/3,1/3,1/3), 
V{Xo\X.,o = ... 21'=) = (1/2, 1/2, 0), and 
P(Xo|Xo = l°°) = (1/2,1/2,0). 

And. from this, we see that S^ = X.q/ ~'+ consists of: 

A+ = {...01'=} and 

s+ = {...2l^l°°}. 

The distribution of symbols preceding futures is more 
complicated. First, we consider futures beginning with 
1*^2, fc > 0. These futures cannot be preceded by symbol 
2. The probability of observing a or another 1 preced- 
ing these futures is 2/3 and 1/3, respectively. We denote 
the equivalence class of all futures starting with 1*^2 as 
reverse causal state B^ . Now, consider all words start- 
ing with 1*^0, an arbitrary number of Is followed by 0. A 
short calculation shows that such words can be preceded 
by a 0, 1, or 2 with the probability depending explicitly 
on the number of Is at the beginning of the future. Thus, 
there is one reverse causal state for every fc, and we de- 
note these states as A/^ . As before, the future consisting 
entirely of Is is most easily understood by taking limits; 
one finds that it is not possible to precede the future with 
a and that 1 and 2 precede the future with probabil- 
ity 1/2 each. This limiting distribution coincides with 
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limfc_j.oo^fc and, so, we label its equivalence class A^. 
Formally, for X_i — (0,1,2) and fc > 0, the conditional 
distributions for every valid future are: 



P(X_i|Xo:-l'=2...) = (2/3,1/3,0), 



P(X_l|Xo: =l'=0. 



{2" 



fc + 2 ^k + 1 ,r,k + l ofc + l^i 



6(2'=+3'») 

P(X_i|Xo: = l°°) = (0,1/2,1/2). 



and 



From this, we see that S = Xq-J 



A^^{0...}, 



consists of: 



A^={10...}, 
^^={110...}, 



a: 



B- 



{I'O...}, 



{1°°} 
{l'=2...}. 



Again, we leave it to the reader to verify that, for this 
particular example, a partition of futures into equiva- 
lence classes with respect to the preceding symbol will 
not change when considering longer strings of preceding 
symbols. 

The reverse causal states can also be obtained by ap- 
plying the forward causal-state equivalence relation on 
the time-reversed HMM of the forward e-machine. That 
is, {Xq-J ^^) = {X-o/ ^+). For example, reverse causal 
state A^ contains every future beginning with 10. Alter- 
natively, we can associate A^ with "histories" (Xg) that 
end 01. Since the support is reversible, this allows for a 
direct comparison to the forward causal-state partition, 
and so Aj" is a subset of forward causal state A'^ . We 
summarize the relationship [48. between the partitions as 
follows: 

A+ = ^0 u Aj; u • • • u A^: u • • • , 

B+=B-UA^. 

Recall, M^ and M^ denote the forward and reverse 
DFAs whose structure is defined by the forward and re- 
verse e-machines without probabilities. In this example, 
M^ y^ Mr since they disagree on the number of states. 
However, M^ is not minimal and would be equal to M^ , 
if it were minimized. This means that the support of 
the process is reversible: -D"*" = D~ . Thus, this exam- 
ple also demonstrates probability-driven irreversibility, 



but differs from the example in Sec VI B 2 which had 
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FIG. 12. Structural classification of hidden Markov mod- 
els: Presentations within the green ellipse correspond to the 
recurrent e-machines. The shaded area is the subset of re- 
current e-machines that are exactly synchronizing and, addi- 
tionally, have uniformly distributed transitions probabilities 
on the outgoing edges of each state. This subset defines the 
topological e-machmes. Areas in the diagram are not drawn 
to scale and only show which classes are contained in other 
classes. 



This example demonstrated that the e-machines of ir- 
reversible processes can be finite in one direction and 
infinite in the other. The process has C^ « 0.971 and 



c- 



1.589 and, so once again, we see that it takes more 



M+=M, 



memory to generate the process from right-to-left than 
from left-to-right. 



C. Survey of Irreversibility 

Reference |3] classified the space of hidden Markov 
models in terms of unifilarity, synchronization, and nrini- 
mality. Figure [l2| reproduces the essential conrponcnts of 
the hierarchy presented there, extending it several ways 

m- 

At the outer-most level, outside the dashed ellipse 
in Fig. [12] we have hidden Markov models that are 
strictly nonunifilar. So, given the current state and 
symbol, there is residual uncertainty in the next state: 
H[Ri\Rq,Xo] > 0. Moving inside the dashed ellipse we 
encounter the strictly unifilar hidden Markov models for 
which this quantity is exactly zero. Unifilarity is an im- 
portant property since, among other reasons, it allows 
one to calculate the process's entropy rate directly from 
the presentation. 

However, unifilar hidden Markov models can have a 
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type of redundancy such that the state is not justified 
by the process statistics. Such models have gauge in- 
formation (p — ll\RQ\X-x).,Xi^\ > 0. And, when we re- 
strict to those with Lp — Q, the hidden Markov models 
become asymptotically synchronizing 50J . This class ex- 
ists within the dotted ellipse of Fig. [12] One signature 
of unifilar models with zero gauge information is that 
the state uncertainty vanishes asymptotically for almost 
every history in the process language j51j . 

Within the class of asymptotically synchronizing hid- 
den Markov models, there exists a subset for which the 
state uncertainty vanishes in finite time for almost every 
history in the process language [55] . Such hidden Markov 
models necessarily have at least one synchronizing word. 
In Fig. JT2| this is delineated by the blue ellipse. 

Another subset within the class of asymptotically syn- 
chronizing hidden Markov models are the minimal unifi- 
lar hidden Markov models. Any hidden Markov model 
with these properties corresponds to an e-machine of a 
process language [53]. This is represented by the green 
ellipse in Fig. [T2| Generally, the set of e- machines and the 
set of exactly synchronizing hidden Markov models (blue 
ellipse) are not the same, and their intersection defines 
the class of exactly synchronizing e-machines. 

Reference [3j's classification of processes and their pre- 
sentations provides a natural setting for developing a re- 
fined classification based on the irreversibility properties 
just introduced. As a first step, though, it is perhaps 
more helpful to develop a quantitative appreciation of 
how common irreversibility is within the space of hid- 
den Markov models. This is a difficult, if somewhat 
open-ended challenge, but we can make some progress 
by examining several subclasses. Systematically survey- 
ing presentations is generally difficult due to the prob- 
abilistic nature of hidden Markov models and the pro- 
cesses they generate. However, if we restrict ourselves to 
hidden Markov models with uniformly distributed tran- 
sition probabilities leaving each state — recall the red, 
wavy parabola in Fig. [12] — then we can systematically 
enumerate them. Essentially, the task boils down to 
enumerating a particular class of finite-state automata. 
Reference |54| provided an exhaustive enumeration of 
exactly synchronizing e-machines with uniformly dis- 
tributed transition probabilities leaving each state. It 
is this class of processes — generated by the topological 
e-machines — that we survey in order to develop an ap- 
preciation of how common irreversibility is within the 
space of hidden Markov models. 

Table [ll] summarizes the survey, giving the number 
l>in,k of topological e-machines [S3] and the number Cn,k 
of irreversible e-machines over n states and exactly k sym- 
bols in the alphabet. (By "exactly k symbols" we em- 
phasize that we excluded from the counts processes with 



fc = 3 that use only 2 symbols, for example.) The im- 
mediate impression is quite striking: Irreversibility domi- 
nates. It comprises over 98% of all topological e-machines 
and their associated processes. Indeed, the fraction of ir- 
reversible e-machines appears to rapidly increase toward 
unity as the number of states increases. And so, what 
might have initially appeared to be a counterintuitive 
property — temporal asymmetry in the statistics of a sta- 
tionary process — is the overwhelming rule in the space of 
processes. 



VII. THE BIDIRECTIONAL MACHINE 

The process, as a stationary probability space, is a 
bulky abstraction, and state-based models, such as hid- 
den Markov models, are often used to provide a much 
more concise representation. However, the forward and 
reverse generators of a process are not unique, and this 
makes it difficult to separate structure in the process from 
structure in presentations of the process. The entropy 
rate h^^ and excess entropy E are two well known struc- 
tural properties of a process. We showed, in addition, 
that crypticity x, oracular information C, and gauge in- 
formation ip are important structural properties of pre- 
sentations. 

The forward and reverse e-machines were introduced as 
a process's canonical presentations and, in doing so, the 
statistical complexities C+ and C~ became process prop- 
erties that, in addition, were easily accessible through 
these privileged presentations. The e-machines were ideal 
in a number of ways, for example and importantly, they 
provided a direct calculation of a process's entropy rate. 
The excess entropy, however, remained inaccessible and, 
so, a new presentation was required. 

The bidirectional machine, introduced in Refs. [20ll21j. 
is a generator that unites the forward and reverse 
e-machines, providing an explicit accounting of the re- 
lationship between them [55]. In doing so, the excess 
entropy, a structural property of a process, becomes ac- 
cessible through a simple calculation and, further, the 
bidirectional machine contains all information necessary 
to reconstruct the forward and reverse e-machines. In 
this section, we define the bidirectional machine and in- 
terpret it through an example from the previous section. 



A. Definitions 

The hidden process lattice of Fig. [7] invites us to con- 
sider a dynamic over joint causal states. We define an 
aggregate state S^ = (5"*" , S~ ) as the 2-tuple of the for- 
ward and reverse causal states with stationary distribu- 
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^n,2 Cn,2 


Nn,3 Cn.3 


NnA C„,4 


Nn,5 C„,5 


Nnfi Cn.e 


1 


1 


1 


1 


1 


1 


2 
3 

4 
5 
6 


7 

78 24 

1,388 1,077 

35,186 33,107 

1,132,613 1,119,623 


120 84 

15,364 14,561 

3,621,474 3,607,084 


1,351 1,200 
1,596,682 1,586,736 


12,900 12,290 


113,827 111,390 



TABLE II. The number N„^k of topological e-machines |54j and the number C„,fc of irreversible e-machines over n states and 
exactly k symbols in the alphabet. 



tion function: 

7r(a7) = P(5± = (a,7)) 
= 7r(a,7) 

for a e S^ and 7 G S^ . Counter to typical usage ± 
in the joint causal state is interpreted as forward and 
reverse, rather than or. Note, that we purposefully over- 
load notation and use n again, but it will always be clear 
from context to which generator we refer. 

Given the (stationary) distribution tt, if we scan left- 
to- right, we obtain a forward generator M^ of the pro- 
cess. If we scan right-to-left, we obtain the process's 
reverse generator M^. These generators are generally 
distinct. However, we will see that M^ is equal to the 
time-reversed HMM of M±. That is, M=F = M±. For 
that reason, we take M* as the starting point. 

Having defined the states, the transition matrices for 
the forward bidirectional machine M* are given by: 



Tx{aj,f3S) 



'f,(7,<5) ifT,(a,/3)>0, 



(",7)) 







otherwise, 



(15) 



where a, f3 E S and 7, (5 G S . The transition probabil- 
ities of the forward bidirectional machine mimic the tran- 
sition probabilities of the time- reversed reverse e-machine 
(Af^), provided the transition is allowed in the forward 
e-machine (Af+). 



To see how Eq. (151 arises, first we note that: 



— P (-'^0, S^ ,S^ \Sq ,Sqj 

= P (5j I 5q , 5q , Xq, Si ) F{Xq, S-^ | 5q , Sq 



a, S^ = 13, S„ 



(16) 



7, 



Following Eq. (151, we take Sq 

Si = 6, and Xq = x. Then, the first factor in Eq. ( 16 ) is 



either or 1, due to unifilarity of the forward e-machine, 
depending on if /? is the unique causal state that follows 
a on symbol x. The presence of Sq — 7 and S^ = 6 



in the conditional does not change this fact, so long as 
(7, X, S) is a valid consecutive combination in the reverse 
e-machine — and this is implicitly handled by the second 
factor. 

The second factor reduces due to the shielding prop- 
erty of hidden Markov models: the past and future are 
independent given the present state. Focusing on the 
reverse e-machine, we express independence formally as: 

¥{Xq.^ , 5iT \X.^Q , Sro ,Sq)= ¥{Xq.^ , S^., \Sq ) 

where {Xq.,S^.) is everything related to the future and 
(Xo,i5.g) is everything related to the past. Now, we 
also know that the forward causal states are determined 
by the past: iJ[5Q''|X.Q] — 0, and this means that the 
forward causal state and future are independent given 
the past causal state [5^ : 

P(Xo^ , 5iT \S+ ,Sq)^ P(Xo^ , 5iT \Sq ) 

Restricting to single-step futures, we obtain: 



P(Xo , 5r I S+ ,Sq)^ P(Xo , 5r I S^ 



(17) 



Understanding this in terms of previously defined quan- 
tities is subtle precisely due to the shifting notions of 
forward and reverse time. Intuitively, Fig. [7] shows that 
we are asking the reverse e-machine to move left-to-right. 
This direction is opposed to the reverse e-machine's local 
notion of forward time. Thus, we expect this movement 
from left-to-right to relate to the time-reversed transition 
matrices of the reverse e-machine. 

At a lower level, we note that the definition, Eq. (l7|, 
of the time-reversed hidden Markov model was stated 
under the assumption that the original model's increasing 
indexes corresponded to a left-to-right movement on the 
lattice. From the labeling in Fig. [7J the reverse e-machine 
does not satisfy this assumption, and a proper translation 
of Eq. ^ is: 



TAl.S) 



V{Xo = x,S^ 
7r(7) 



5\Sq=^) 



(18) 
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FIG. 13. Left: Portion of the process lattice relevant to the 
bidirectional machine's transition matrices. Right : R ealiza- 
tions of the process lattice as it appUes to Eqs. ( 19 1 
(2T|, (|22|, and ([23|. 
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FIG. 14. Equations ([22| (left) and ^ (right) demonstrate 
path equivalence. In each, the red and blue paths are equiv- 
alent ways of moving around on the process lattice. 



which is exactly the quantity in question. The result is 
that whenever Tx{a,P) > 0, then the transition proba- 
bility of the forward (left-to-right) bidirectional machine 
is determined by the transition matrices of the time- 
reversed reverse e-machine (Af"). 

The reverse bidirectional machine AP is analogously 
defined by the right-to-left dynamic over the joint causal 
states. This requires that the forward e-machine move 
right-to-left on the lattice, a direction that is opposed to 
its local sense of forward time. The result is that we use 
the time-reversed forward e-machine (Af+). Similarly, 
the reverse e-machine is required to move right-to-left 
on the lattice. This direction is in agreement with its 
local sense of forward time and so, we utilize the reverse 
e-machine (M^) as is. For a, /3 £ S^ and 7, (5 G <S^, we 
have: 



T,(a7, I3S) = P (Xo = x, S^ = (/3, S) \ 5± = (a, 7)) 

^ff,(a,/3) ifT,(7,5)>0, 
1 otherwise. 



(19) 



The proof proceeds analogously to the forward bidirec- 
tional machine and is omitted here. For future reference. 
Fig. [13] displays a, /3, 7, and S on the process lattice, as 
they are used in the definition of the reverse bidirectional 
machine. Thus, we have Sq = {(3,5) and St = (a, 7). 
Note, these variables are swapped in the definition of the 
forward bidirectional machine. 

The choice of T^ as the notation for the reverse bidirec- 
tional machine's transition matrices suggests that it is re- 
lated to the time-reversal of the forward bidirectional ma- 
chine. Indeed, the definition of the forward bidirectional 
machine already provides the matrices for the right-to- 
left dynamic. Thus, we see that AP = M^: 

TL(a7, pS) = V{Xo= X, S^ = (/3, 6) \ St = (a, 7)) 
7r(/3,(5)T,(/3(5,a7) 



7r(a,7) 



(20) 



e-machine: 



T^[a^,fi8) = 



7r(/3,^) 
7r(a,7) 




T,(<5,7) ifT,(/3,a)>0, 
otherwise; 



^'^T.(7,<5) ifr.(a,/?)>0, 
7r(a|7) (21) 



^0 

Comparing Eqs. 
T^(n,5) and T^ia, 
we have: 



otherwise. 



19| and (21), we see that whenever 



are simultaneously positive, then 



4a|7)r,(a,/3)=^(/3|(5)r,(7,,5) 



(22) 



A complementary relation, obtained by applying Bayes 
theorem, is: 



7r(7|a)r,(/3,a) = 7r(,5|/3)r,(<5,7) 



(23) 



The interpretations of Eqs. (22) and (p3| are properly 



framed using the process lattice, as shown in Fig. [13) We 
could have also worked with the forward bidirectional 
machine, expressing its transition matrix as the Bayes 



inverse of T^ and, then, equating it to Eq. ( 15 ). However, 
this does not yield any new insight. 

Generally, these equations represent path equivalence 



on the process lattice. In the left-hand side of Eq. (22 1, 
we begin in S^ — 7, transition to S^ = 5 on symbol 
Xq = x, and then shift to S^ = /3. This path is repre- 
sented in red in the right diagram of Fig. |14[ The right- 



hand side of Eq. (22 1 says that the red path is equivalent 



(in probability) to the blue path, which also begins in 
S^ — 7. However, now it shifts to St = ol first, and 
then reverse transitions to S^ = /? on symbol X^ — x. 



Equation ( 23 ) provides an analogous result and is sum- 



Applying Eq. ( 15 1 gives the direct relation to the forward 



marized in the left diagram of Fig. |14[ There, we begin 
in Sq and transition to S^ via two equivalent paths. 

The bidirectional machines are so-named because their 
state space consists of the forward and reverse causal 
states and their transition dynamic allows one to go in 
either direction. However, the bidirectional machine is 
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still a one-way generator and this is why there are two 
variants, M* and M^. These two variants are simply 
time-reversals of one another, even if the underlying pro- 
cess is irreversible. Having established the proper setting 
for bidirectional machines, in the next section we analyze 
several of their properties and the consequences of this 
symmetry. 



B. Bi-Inflnite Sequence Partitions 

The bidirectional machine M^ can be understood by 
analyzing its effect on the past and future. Previously, 
we saw that the forward causal states S^ partitioned 
the semi-infinite histories X-q, while the reverse causal 
states S^ partitioned the semi-infinite futures Xq-. The 
bidirectional machine, it turns out, partitions the set of 
bi-infinite strings X-, = X-oXq.. This is expressed by the 
bidirectional equivalence relation |21| : 



{X;0, XQ.) ^ (X.g,XQ.) 



f'^ix-.o) = e+(a;'o) and 



Thus, the bidirectional causal states S are a partition 
of bi-infinitc strings resulting from the application of an 
equivalence relation: S — {X.q,Xq.)/ ^*. The map- 
ping e^(-) that takes a bi-infinite string to its bidirec- 
tional causal state is defined: 

e=^(a;:o,a;o:) 

= {{x'.q^Xq.) : x'.Q G e+(a:;:o) and x'^. e e^{xo:)} ■ 

Note, that the same equivalence relation is used for the 
forward and reverse bidirectional machines. All that dif- 
fers is the dynamic over the states. 

For M* and M^, we use a bi-infinitc instance and 
shift the time origin to the right (for M^) or to the left 
(for M^). The symbol encountered during the shift is 
the symbol generated. 

However, given any bidirectional partition, it does not 
follow that the dynamic will be unifilar, and this is pre- 
cisely the case for the bidirectional machines. With 
e-machines, all histories (or futures) in the equivalence 
class have exactly the same distribution over futures (or 
histories). And so, on the next symbol, every history (or 
future) in the causal state transitioned to the same next 
causal state. With the bidirectional machine, this is no 
longer true, and the dynamic over the states is generally 
nonunifilar. 



C. Properties 

Each of the process and presentation properties dis- 
cussed can be considered operators. That is, given a 
model M, we calculate a quantity relative to the model 
alone, using its local sense of time. This point is worth 
remembering as we discuss properties of the bidirectional 
machines. We will continue, however, to frame the var- 
ious quantities using the bird's eye view of the process 
lattice. 

The stationary distribution for the forward bidirec- 
tional machine is P(5"'",5~) and, as Sec. IV B discussed, 
the reverse bidirectional machine has the same station- 
ary distribution. Using Refs. [201 [H], we can immedi- 
ately calculate the excess entropy as E = /[5+; 5^]. Im- 
portantly, this quantity is not calculable given only the 
forward and reverse e-machines. (Alternate methods to 
calculate E end up being essentially equivalent to invok- 
ing the bidirectional machine.) 

As mentioned, the bidirectional machine can also be 
nonunifilar. Since the bidirectional causal states are the 
joint distribution over the forward and reverse causal 
states, the bidirectional machine's oracular information 
C{M^) is the crypticity of the reverse e-machine x~ ■ Ad- 
ditionally, the bidirectional machine's crypticity x{M^) 
is the crypticity x^ of the forward e-machine. 

If, instead, we work with the reverse bidirectional ma- 
chine M^, all the interpretations are flipped. Then 
the crypticity x(M^) is equal to the reverse e-machine's 
crypticity x~i and the oracular information C(M^) is 
the forward e-machine's crypticity x^ ■ Recall that 
e-machines do not have oracular information, since they 
are unifilar. 

These information quantities are summarized in 



Fig. 16 There, we see that the reverse bidirectional ma- 
chine swaps crypticity and oracular information just as a 
general hidden Markov model [3 . 

Of the presentation quantifiers, this leaves only the 
gauge information (p^ to be explained. Recall that the 
past X.Q completely determines the future causal state 
Sq and that the future Xq- completely determines the 
past causal state S^ . Then, this gives: 

ip{M^) = H[S^\X.,o,Xo:] 

= H[S+,S-\X.,o,X„.] 

= H[S+\X.,o,Xo:]+ H[S-\X.,o,Xo.,,S-] 

<H[S+\X.,o]+H[S-\Xo:] 

= 0-1-0 . 

Thus, the bidirectional machine does have a certain rep- 
resentational efficiency: It has no gauge information. 



This is implicitly shown in Fig. 16 but more easily seen 
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in Fig. [8] There, we see that the elhpse representing the 
bidirectional machine's states (the union of Cj" and C~) 
only consists of areas within the entropies of the past 
H[X-q] and future H[Xo-]. Naturally, one wonders if it 
is possible to define the bidirectional machine through 
constraints. To this end, we conjecture that the bidi- 
rectional machine is the only generator of the process 
with zero gauge information that marginalizes into the 
forward and reverse e-machines and, additionally, has 
X(M±) = xiM+) and C(A/±) = x{M-). 

We now turn to the various state entropy quantities 
that play a role in the bidirectional machine. The state 
entropy of the forward and reverse e-machines repre- 
sented the forward and reverse statistical complexities: 
C+ = H[S~^] and C~ = H[S~]. Similarly, we denote the 
state entropy of the forward and reverse bidirectional ma- 
chine by C± = H[S^] = H[S+,S-] and call it the bidi- 
rectional statistical complexity. It represents the total 
amount of information needed to predict or retrodict op- 
timally. The key difference between C^ and the directed 
statistical complexities is that with the bidirectional ma- 
chine, one has a choice in which action, prediction or 
retrodiction, is taken [S7] . We further note that both C+ 
and C~ play equivalent roles in C^, to the extent that 
E is contained in both. Due to this, we can see that: 



c± = c+ + c- 



E 



(24) 



One can also marginalize the bidirectional machine's 
transition matrices to recover the forward and reverse 
e-machines. For a, /3 g S and <5, 7 € «S~, we marginalize 
M^ to get M+ as follows: 

r,(a, /3) = P(Xo = x, 5+ = /3 I S+ = a) 

7,(5 

where 7r(7 | a) = 7r(a7)/7r(a) and ra;(a7,/3(5) is given by 



Eq. (15). Similarly, we marginalize M^ to get M : 



r,(7, S) = P(Xo ^x,S^^d\ 5r = 7) 
= ^7r(a|7)ra,(a7,/?5). 



a,p 



where 7r(a | 7) = Tr{aj)/n{'-/) and Tx{a"f, /3S) is given by 



Eq. (19) 



It also happens that knowing the bidirectional causal 
state is not always helpful. Specifically, we have: 

H[Xo\S+,So] = H[Xo\Sa] and 
H[X^,\S+,S^]^H[X^,\S+] . 

In other words, a question about the future is best un- 



derstood by something which comes from the future (and 
vice versa for questions about the past). The reason for 
each of these results can be immediately deduced from 
Fig.i 



D. Uses 

The bidirectional machine is also useful in a number of 
ways. We briefly mention several. 

First, we note that M^ and M^, together, could be 
interpreted as a transducer. Given a desired direction 
of time, one can move forward or backward along the 
process lattice. While the transducer viewpoint holds 
for any hidden Markov model, only the bidirectional ma- 
chine allows one to predict or retrodict. To wit, if one 
constructed a transducer using Af+ and M"*", then one 
could make predictions, but it would not be possible to 
retrodict since the forward causal states are not sufficient 
statistics for the future — they are not suited for retrodic- 
tion. This is precisely the advantage of the bidirectional 
machine, since it tracks both the forward and reverse 
causal states. 

Second, the bidirectional machine allows one to ex- 
actly calculate the persistent mutual information Ii |58| 
over a single-step time interval. Previously available only 
through empirical estimates, Xi is the amount of informa- 
tion I[X.()] Xi-\Xq] shared between X-^q and Xi-, ignoring 
Xq. Note that neither e-machine can give us the appro- 
priate distribution over X-q and Xi-,, but the bidirectional 
machine can. And so, it allows one to calculate Ii ex- 
actly. Since X-q determines Sq and Xi- determines Si , 
we can write the shared information as Xi — I[Sq;Si]. 
The bidirectional machine provides access to the joint 
distribution V{Sq ,Sf^ ,Xq, S^ , 5f ) and from this, we can 
calculate Ti in closed-form. 

Finally, Rcfs. [59, and [50] investigated the binding 
information b^ = I[Xq; X:i\X.o\ and the residual en- 
tropy r^ = 7;f[Xo|X.o,Xi.]. There, they had to be 
computed essentially by brute force. Fortunately, the 
bidirectional machine again allows us to compute these 
exactly and in a manner similar to that for Zi. We 
again replace X.q by Sq and Xi. by S^ , giving b^ — 
I[Xo;Si\Sf^] and r^ — H[Xo\Sq ^S^]. Here also, the 
bidirectional machine's transitions provide the joint dis- 
tribution ¥{Sq ,Sq , Xo,Si ,Si), which can be manipu- 
lated appropriately to compute both 6^ and r^. 

In summary, we see that the bidirectional machine 
gives ready access to closed-form calculations for a wide 
range of measures in complex processes. 
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E. Example 

We close by returning to the irreversible example of 
Fig. [9j Its forward e-machine has two causal states while 
its reverse e-machine has three causal states. When the 
partitions for each e-machine are logically ANDed to- 
gether, we obtain the bidirectional machine's partition 
over bi-infinite strings. 

A compelling visualization of the bidirectional ma- 
chine's partition is to superpose the partitions that ap- 
peared in Fig. |9] For example, in the forward e-machine, 
the square corresponding to X-iXq = 21 was associated 
with state A (turquoise). In the reverse e-machine, the 
same square was associated with state D (red-orange). 
Together, the same square appears in Fig. [15] as both A 
and D. 

Continuing superposition, we see that there are four 
bidirectional states and that these four states partition 
all bi-infinite sequences. In particular, bidirectional state 
AD includes any sequences ending with a or 2 and 
beginning with a 1. So, if one learns C^ bits, then one 
has the luxury, in this case, of knowing that the next 
symbol must be a 1. There is inherent uncertainty in the 
retrodicting the previous symbol. This is easily verified 
in the bidirectional machine M* (left) of Fig. 
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Finally, Fig. [16] presents the bidirectional machine's 
information diagram sans the past H[X.o] and future 
H[Xq.]. The three circles, now drawn to scale, represent 
the statistical complexities for the forward and reverse 
e-machines and, also, for the bidirectional machine. Note 
that the bidirectional machine's state is simply the com- 
bination of the forward and reverse causal states. Cal- 
culations give C+ = 1 bit, C~ = 3/2 bit, E = 1/2 bit, 
C± = 2 bits. This yields x+ = x{M^) = ({M^^) = 1/2 
bit, X' = xiM"^) = C{M^) = 1 bit and, finally, v? = 0, 



verifying Eq. (24 1. The bidirectional machine also gives 



Xi = 0, 6p = 1/2, and r^ = 1/2. Then, according to Ref. 
[ST], the entropy rate is /i^ = 6^ + r^ = 1. 



VIII. CONCLUSION 

The preceding developed a rather thorough survey of 
reversibility, irreversibility, and time asymmetry — these 
being understood in the sense of analyzing a process's 
statistical and structural properties scanned either in 
forward or in reverse directions with respect to the di- 
rection in which it was given or generated. One result 
was a stark distinction between Markov chains and hid- 
den Markov models. For one, we explored the ability 
of hidden Markov models to finitely represent infinite- 
state Markov chains. This came at a high cost, as we 
noted: The problem of representational degeneracy ap- 



pears. We removed this, however, and so were able to 
present a number of constructive results by using the 
e-machine as a canonical presentation. Considering that 
our field of interest is stationary processes, what we found 
was surprising. First, irreversibility is a dominant prop- 
erty in process space. Second, processes that are finite 
in one direction can explode into infinite-state processes 
in the other. And, third, there is a suite of information- 
theoretic measures, helpfully and constructively captured 
in various information diagrams, that quantitatively dis- 
tinguish structural properties of presentations. 

The net result is a new appreciation of irreversibility 
and a new toolkit for analyzing irreversible processes. 
There are many interesting implications of the long list 
of technical results. To suggest what these might be and 
how they will be applied in the near future, we would like 
to close by returning to the physical motivations called 
out at the beginning. Specifically, we will comment on 
the physical meaning of "hidden" processes, the relation- 
ship between the diverse irreversibility properties of pro- 
cesses and possible physical instantiations, and, finally, 
irreversibility in thermodynamic processes. 

Why hidden processes? During an interaction between 
any two systems, only a portion of each system's inter- 
nal configuration (or state) is presented to or is available 
from the other. On the flip side, not every system can 
take on the full state information of another. In effect, 
each system views the other as a hidden process. More- 
over, in this view measurement is only a special case of 
interaction. The measurement act typically does not pro- 
vide all of the observed system's state. Thus, for mea- 
sured processes or collections of interacting systems one 
should view them and analyze them as inherently hidden 
processes. 

Although the analysis largely stayed at the level of 
probability, statistics, and information, any implementa- 
tion resides in a physical substrate. This simple obser- 
vation leads one to immediately ask, How are the statis- 
tical and structural properties and classifications of irre- 
versibility related to the organization of a physical sub- 
strate? The direct technical answer is that each atom in 
the process's information-measure sigma algebra is asso- 
ciated with particular degrees of freedom, structures, and 
behaviors in a physical implementation. The connection 
can be made constructively: One of the longest-standing 
methods to map between continuous-state physical sys- 
tems and sequences is given by symbolic dynamics [62]. 

In light of the preceding structural classifications, one 
now sees that the range of alternate presentations for a 
process parallels and constrains the range of its possible 
physical implementations. In this, each different presen- 
tation comes with its own distinct set of properties — 
redundancy, crypticity, oracular information, and the 
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FIG. 15. The bidirectional machine M (left) has causal states 5 = {S'^jS') that partition bi-infinite sequences X-, of the 
causally irreversible process of Fig. [9] (right). In this case, it is sufficient to partition sequences using only (X_i,Xo). However, 
when used as a forward (or reverse) generator, the states of the resulting hidden Markov model M do not correspond to a 
partition of the pasts (or futures) since the machine is nonunifilar, as is directly checked in the state-transition diagram. 




FIG. 16. A quantitatively scaled information diagram for the 
bidirectional machine of Fig. [15] The bidirectional states com- 
bine the forward and reverse causal states and are represented 
by the black, encompassing line. Since the forward (blue) and 
reverse (green) statistical complexities lay completely within 
the past and future respectively, the bidirectional machine has 



no gauge information; </p(M 



0. 



like. In short, then, to study a process's presentations, 
to classify them, and to metrize their properties is to 
study fundamental properties of the associated physical 
implementations. 

Of course, more is required to complete the mapping 
from a presentation's intrinsic computation to the re- 
quired physics. For example, what is the entailed dissi- 
pation? This reminds one, naturally, of Landauer's Prin- 
ciple: A computation's logical irreversibility is a lower 
bound on the required amount of energy dissipation in 
the physical implementation [53] . To the extent that dy- 
namical irreversibility and crypticity control logical irre- 
versibility, then they also put a lower bound on the phys- 
ical implementation's rate of energy dissipation. More 
generally, the development above gives a qualitative lower 
bound on the richness available and a wide range of ap- 
plications. 

As noted in the Introduction, irreversibility is com- 



monly interpreted as a transient relaxation process. For 
example, isolated thermodynamic systems move to equi- 
librium since, according to Boltzmann, there are over- 
whelmingly more microstates associated with the equilib- 
rium macrostate. This is concisely monitored via the in- 
crease in thermodynamic entropy during relaxation from 
an ordered state. It is enshrined in the Second Law of 
Thermodynamics. However, as we showed, relaxation is 
not the only kind of irreversibility that a thermodynamic 
system can exhibit. There are also irreversibilities, as we 
analyzed in detail, within nonequilibrium steady states 
or, equivalently, within general stationary stochastic pro- 
cesses. There, a thermodynamic system is still a process, 
behaving in time. It is the structure of this temporal be- 
havior that leads to dynamical irreversibility within the 
set of configuration trajectories — the temporally invari- 
ant set consistent with being in a nonequilibrium steady 
state. The preceding gave a new view of just what these 
structures are, what irreversibility means in hidden pro- 
cesses, and a general classification scheme for dynami- 
cally reversible and irreversible processes. 

Concretely, recent explorations of thermodynamic irre- 
versibility and energy dissipation [64H66J ignore distinc- 
tions that are critical for properly identifying statisti- 
cal irreversibility and intrinsic computation, as laid out 
here. Thus, the preceding developments provide a de- 
tailed analysis that will help these efforts by rectifying 
and grounding these notions, particularly in terms of the 
possible physical instantiations of dynamical irreversibil- 
ity. 

Our analysis of how the past and future are contained 
in the present is addressed to a complex world in which 
structure and randomness co-exist: 

Time present and time past 

Are both perhaps present in time future. 

And time future contained in time past. 
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T. S. Eliot, Buirnt Norton, 
Four Quartets. 



No. 



1 of 



In considering general stochastic processes, though, the 
analysis moves substantially beyond the deterministic 
world of Laplace's omniscient Daemon, where initial data 
is exactly preserved for all times, past and future. Eliot 
aptly summarizes our exploration of irreversible pro- 
cesses, their pasts and futures, and the role the bidirec- 
tional machine plays in capturing the structured present. 
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Appendix A: Markov Order is Time Symmetric 

The principal goal here is to review the properties 
of Markov processes so that we can establish the time- 
symmetry of the Markov order. 

Definition 1. A process V is order-i? Markov ij and 
only if: 



P(Xo|Xo)-P(Xo|X_H:o) ■ 



(Al) 



If V is order- i? Markov, then it is also order- i?' Markov 
for R' > R. However, it is common to refer to the small- 
est such R as the Markov order. 

Lemma 1. IJ a process V is order-R Markov, then the 
future depends only on the last R symbols; that is, 



FiXo..L\X.,o)^nXo:L\X-R..o) 



(A2) 



Proof. By a simple application of the chain rule, we 
have: 



nxo:L\x..o) = i[nxt\x..t) 



t=0 
L 



= l[F{Xt\X^R.,o,Xo..t) 



t=o 



F{Xo:l\X-b,o) . 
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The result generalizes. The probability of any combi- 
nation of random variables in the future given the entire 
past is the same as when given only the last R symbols. 

Note that the Markov definition is not time symmetric. 
This invites another notion of Markovity. 

Definition 2. A process V is order- _R reverse-Markov if 
and only if: 



P(X_l|Xo:)-P(X_l|Xo:fl) 



(A3) 



Lemma 2. // a process V is order-R reverse-Markov, 
then the past depends only on the first R symbols: 



P(X_i:0|X0:)-P(X-L:0|X0:fl) 



(A4) 



It happens that the Markov order and reverse Markov 
orders are always equal. 

Theorem 1. A process V is order-R Markov if and only 
if it is order-R reverse-Markov. 



Proof. We assume V is order-R Markov, and then show 
that V is order-R reverse-Markov as well. Recall that 
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any joint distribution can be forward factored as: 

6-1 



Similarly, we have: 



nXa:b)^l[nXt\Xa:t 



If the process is Markovian and {b — a) > R, then this 
factoring simplifies to: 



a+R 



b-1 



fl-1 



u=a+B.+ l 



L-1 



nXa:b) =Y[PiXt\Xa■.t)Y[nXu\Xu-R■.^ 

Next, we have: 
E 

PiX^l.,L)^Y[P{Xt\X^l.,t)Y[F{Xu\Xu-R:u) 
t=-l u=R 

L-1 

= P(X_l:fl)[]P(X„|X„_au) 
u=R 

L-1 

= ¥{X_i\Xo:r)P{Xo:r)Y[P{Xu\Xu-R:u) 



u=R 



r l-1 

P{Xo..L)=l[P{Xt\Xn.,t)YlP{Xu\Xu-R:u) 
t=0 u=R+l 

L-1 

= P{Xo..R+l)l[P{X^\Xu-R..u) 
u=R+l 
L-1 

= P{Xo:r)1[P{XJXu_r.,^) . 

u=R 

So, finally, we obtain the desired result: 

P(^-1:l) 



P(X_i|Xo:l) 



P{Xo:l) 
P{X^i\Xo:r) 



In the other direction, we use the reverse factoring of a 
joint distribution: 



h-l 



PiXa:b)^YlPiXt\Xt+l..b) . 
t—a 

Then, we assume the process is reverse-Markov to obtain: 

h-Lf-2 b-l 

P{Xa:b)=l[P{Xt\Xt+l..t+l + R)l[P{Xt\Xt+l..b) . 
t=a u=b-R-l 



P{X^L:l) = P{Xo\X^R:o)PiX-R..o) 
-{R+1) 
xl[P{Xt\Xt+i..t+l+R) 



t=-L 



and 



-{R+l) 

PiX_L:o)=PiX-R..o)YlPiXt\Xt+l:t+l+R) - 
t=-L 



Then, 



P{Xo\X^L:o) 



P(^-L:l) 
P(^-L:0) 
P{Xo\X^R.,o) 



The results hold for every L > R and in the L — > cx3 
limit, too. n 

The two notions of Markovity relate to forward and 
reverse generators. 

Lemma 3. The forward generator M^ is order-R 
Markov if and only if the reverse generator is order-R 
reverse-Markov. 

Proof. This follows directly from the definition of the 
reverse process. Assume M"*" has Markov order R. Let 
\u\ ~ L — 2R and \w\ = \v\ = R. Then, 

P{X_l\Xo:L = WUV) = P{Xl\X^L+l:l = VUW) 

^PiXi\X^R+i.,i^w) 

= p{x^i\Xo:R^w) . n 

With this interpretation, it is a short step to see that 
the Markov order is reversible. 

Corollary 1. The forward generator is order-R Markov 
if and only the reverse generator is order-R Markov. 



Proof. Apply Thm. [7] and then Lem. [^ 
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Appendix B: The Explosive Example Revisited 



In Sec. |VI B 3[ we examined a causally irreversible 
process whose forward e-niachine had two causal states, 
while its reverse e-machine had a countable infinity of 
causal states. Here, we provide details for calculating 
this reverse e-machine from the forward e-machine. We 
give expressions for the excess entropy and statistical 
complexities. A detailed analysis of the various kinds 
of causal states — recurrent, transient, and elusive — for 
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the forward and reverse e-machines appears in Fig. 17 
and gives some insight into the origins of the reverse 
e-machine's infinite number of causal states. 

The forward and reverse e-machines are shown in 



Fig. 17 The entropy rate, since it is reversible 69' , is 
easier to calculate from M"*". This is given directly: 

h^ = H[Xo\S+] 



2 
1.350 955 500 432 . 



log2 3 



(Bl) 



The forward statistical complexity is: 



''i->^4 



2 5 

- log, - 
5 *^^ 2 



(B2) 



0.970 950 594 455 



For n > 0, the mixed-state operator [21] acting on M^ 
gives: 



¥{S+\S^ = A-_,] 



and 



3 • 2'' 



2-3" 



3 • 2*^^ + 2 • 3" ' 3 • 2" -h 2 • 3" 



ViS+\S^ ^ B-) = (1,0) . 

As it turns out, these distributions are also the mixed 
states for the transient causal states of Af + in the basis 
of its recurrent states. That is: 

P(5o+|5o+ = D+) = nS^\So = K-i) ■ 

To determine 7r(5^) we solve the following simultaneous 
equations: 

_. oo 

7r{B-) = -n{B-) + Y,Cn7r{A-) 

2 °° 

Tr{A-) = &„_i7r(4;_i) n > . 



Beginning with the third, we have: 

tt{A-) = 6„_i7r(yl^_i) 



= (l[bAn{A^) 

\ n=0 / 



for n > 0. Then, solving for 7r(_S~), gives: 



riB-) = -J2c„niA-) 



n=0 



So, 



niB-) = -7r{A^) 



The normahzation constraint becomes: 

oo 

l=7T{B-) + Y,n{A-) 



n=0 



Thus, 



n{B-) 



5-2" 



10 ' 
Collecting these together, we find: 

f2\ 



w 1.588 621 621 714 



5-2 



3^ \ log, ( ^^aI^ 



Finally, 



i:^c+-H[s+\s-] 



-ct-H 



1 



H 



i-T 



, 5-2" / ^" V 3 • 2*^ + 2 • 3' 

n— \ / ^ 

« 0.304 159 734 344 , 
where iJ (•) is the binary entropy function. 



32 





©^©^-©^■-©O^i 




dn-l\2 





0^i^0^ ...0^ ...,>,;q., 




FIG. 17. The forward e-machine Af + (top) has only two recurrent (shaded) causal states yl+ and _B+. The reverse e-machine M~ 
(bottom) has an infinite number of recurrent causal states. Transition labels in both machines make use of: an = 2""'"^(32„)~^, 
b„ = 1 — (a„ + c„), c„ — 3"(2z„)~^, d„ = 1 — 26„, and z„ = 2" + 3". The dashed state labeled A'^ is an elusive causal 
state H]: It is infinitely preceded, but neither reachable nor recurrent. The hexagon-shaped states are strictly transient states 
and only induced by finite-length histories. Note, the limit of the D^ states is D^ — B^ and it was drawn separately only to 
demonstrate the trend. 



